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Preface 



Intelligent agents are one of the most important developments in computer 
science in the 1990s. Agents are of interest in many important application areas, 
ranging from human-computer interaction to industrial process control. The 
ATAL workshop series aims to bring together researchers interested in the core 
aspects of agent technology. Specifically, ATAL addresses issues such as theo- 
ries of agency, software architectures for intelligent agents, methodologies and 
programming languages for realizing agents, and software tools for developing 
and evaluating agent systems. One of the strengths of the ATAL workshop series 
is its emphasis on the synergies between theories, infrastructures, architectures, 
methodologies, formal methods, and languages. 

This year’s workshop continued the ATAL trend of attracting a large num- 
ber of high-quality submissions. In more detail, 75 papers were submitted to 
the ATAL-99 workshop, from 19 countries. After stringent reviewing, 22 papers 
were accepted for presentation at the workshop. After the workshop, these papers 
were revised on the basis of comments received both from the original reviewers 
and from discussions at the workshop itself. This volume contains these revised 
papers. 

As with previous workshops in the series, we chose to emphasize what we 
perceive as an important new trend in agent-based computing. In this case, 
we were motivated by the observation that the technology of intelligent agents 
and multi-agent systems is beginning to migrate from research labs to software 
engineering centers. As the rate of this migration increases, it is becoming in- 
creasingly apparent that we must develop principled techniques for analyzing, 
specifying, designing, and verifying agent-based systems. Without such techni- 
ques, agent technology will simply not realize its full potential. Consequently, 
the ATAL-99 program placed particular emphasis on agent oriented software 
engineering and the evaluation of agent architectures . Besides several papers in 
each of these special tracks, the program also featured two associated panels (or- 
ganized by Mike Wooldridge and Jorg Muller respectively). Another highlight of 
this year’s program was the invited talks by leading exponents of agent research: 

theories John Pollock Rational Cognition in OSCAR 

architectures Sarit Kraus Agents for Information Broadcasting 

It is both our hope and our expectation that this volume will be as useful 
to the agent research and development community as its five predecessors have 
proved to be. We believe that ATAL, and the Intelligent Agents series of which 
these proceedings will form a part, play a crucial role in a rapidly developing 
field, by focusing specifically on the relationships between the theory and prac- 
tice of agents. Only through understanding these relationships can agent-based 
computing mature and achieve its widely predicted potential. 
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Introduction 



Like its predecessors [4, 5, 2, 3, 1, this volume of the Intelligent Agent series fo- 
cuses on the relationships between the theory and the practice of intelligent 
autonomous agents. To this end, the volume is divided into five sections, reflec- 
ting the major current research and development trends in the held. Section I 
presents work on agent theories. Section II discusses work on architectures for 
single agents and architectures for entire systems. Section III deals with a va- 
riety of agent languages. Section IV presents work on agent-oriented software 
engineering. Finally, section V deals with agents making decisions while taking 
the presence of other agents into account. 



Section I: Agent Theories 

Wooldridge and Lomuscio address the problem of giving a semantics to attribu- 
tions of knowledge and perception attitudes that is grounded. In their semantics, 
the truth conditions of these attitudes are defined in terms of actual states of 
the agent’s internal architecture and their relation to environment states. The 
logic allows one to model the relation between what is objectively true in the 
environment, what is visible to the agent, what the agent actually perceives, and 
what it knows. 

Lomuscio and Ryan discuss how modal logic can be used to model how kno- 
wledge may be shared between agents in a multi-agent system. They examine 
various axioms about relations between nested knowledge, i.e. knowledge one 
agent has about what another agent knows. They classify the logical systems 
that result when various axioms about such knowledge sharing are adopted, spe- 
cify the associated semantics, and prove completeness results for a large number 
of cases. 

Isozaki and Katsuno also deal with nested belief, looking more specifically at 
nested belief change. They propose a representation that handles many cases of 
nested beliefs and present an efficient algorithm for updating these beliefs taking 
into account observability, memory, and the effects of action. The algorithm’s 
soundness with respect to the logic is proven. 

Wobcke examines how agent programs can be specified and proven correct. 
He looks specifically at programs based on the PRS architecture. He proposes 
a methodology for writing such programs and develops a formalism based on 
dynamic logic and context-based reasoning for proving properties about them. 

Xuan and Lesser address the problem of coordination between autonomous 
agents. In particular, they identify the notion of commitment as a key considera- 
tion. By their very nature, commitments between agents involve a significant de- 
gree of uncertainty and such uncertainty needs to be taken into account when an 
agent performs its planning and scheduling activities. To this end, a framework 
for incorporating uncertainty into commitments is presented and a concomitant 
negotiation framework for handling such commitments is developed. 
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Section II: Agent and System Architectures 

Pollock discusses his work on rationality and its embodiment in the OSCAR 
agent architecture. In particular, he focuses on the design of an agent that can 
make decisions and draw conclusions that are rational when judged against hu- 
man standards of rationality. 

David and Kraus describe the design, implementation and evaluation of an 
information broadcasting system. In particular, they focus on ways in which 
agents can characterize users’ needs and use these characterizations to ensure 
information is broadcast in the most efficient manner possible. 

Hexmoor et al. report on the panel on the evaluation of agent architectures 
that took place at the workshop. A number of key issues in this area are raised 
and then each panel member presents his response. 

Wallace and Laird present their ongoing work towards developing a methodo- 
logy for comparing and evaluating agent architectures. They explore the issues 
involved in architecture evaluation and identify a number of potential evalua- 
tion strategies. Eventually they opt for an approach that involves identifying the 
fundamental properties required by intelligent agents. Their approach is demon- 
strated by applying it to the Soar and the CLIPS architectures. 

Lee also addresses the problem of evaluating agent architectures. In contrast to 
Wallace and Laird, he focuses on a particular class of domains: namely, reactive 
systems, i.e. systems in which an agent must maintain an ongoing interaction 
with a dynamically changing environment. For this class of system, he identifies 
a range of necessary features that an agent must possess. The availability (or 
not) of these features is then used to rate a number of common architectures 
that have appeared in the literature. 

Paolucci et al. describe a planning component for agents that operate in a dy- 
namic environment, have only partial knowledge of this environment, and must 
cooperate with each other. The component is part of the RETSINA multi-agent 
architecture. The planner, which uses a hierarchical task network representation, 
interleaves planning with action execution and monitoring the environment for 
changes that might invalidate a plan. When a plan cannot be completed due 
to lack of information, planning may be suspended while information gathering 
actions are executed. 

Shehory addresses the problem of agents finding information and services in 
large scale, open systems. The problems of the traditional approach to this task 
(i.e. matchmakers, brokers, yellow pages agents and other forms of middleware 
agent) are highlighted. Then, an approach is advocated in which each individual 
agent caches a list of acquaintances. This approach is shown to enable agents to 
locate one another without the need for middleware agents and to ensure that 
the associated communication complexity is fairly low. 

Section III: Agent Languages 

Lesperance et al. present an approach to the development of robot controllers 
that have “high-level reactivity” using the ConGolog agent programming lan- 
guage. Such controllers maintain a model of the environment and can react to 
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environmental events or exceptional conditions such as the failure of a robot 
action to achieve its objectives. They do so by suspending or abandoning the 
current plan and selecting a new plan that is an appropriate response to the 
event or exception. 

Baral and Son extend the ConGolog agent programming language with a 
construct that supports hierarchical task networks. This allows plans that involve 
a partial order over actions to be easily specified. It also bridges the gap between 
the procedural model of ConGolog and that of rule-based agent languages. 

Ferber and Gutknecht show how one can develop a formal semantics for an 
architectural model of open multi-agent systems based on the notions of role and 
group. In this model, roles represent functions within a group and an agent can 
play multiple roles. The semantics formalizes the mechanisms of group creation 
and admission to a group, and constrains communication between processes 
belonging to different groups. 

Van Eijk et al. extend their work on a programming language for multi-agent 
systems to support the communication of propositional information between 
agents and the integration of new agents into an open system. A formal semantics 
based on transition systems is presented. 

Pynadath et al. concentrate on the problem of programming autonomous 
agents to act as teams. They develop an abstract and domain independent fra- 
mework for specifying an agent’s behavior when it acts as part of a team. This 
teamwork layer can be added as a form of meta-controller to the individual 
agents so that they become team-enabled. This approach simplifies and speeds 
up the process of building cooperating agents since there is a significant amount 
of design experience and code re-use. 



Section IV: Agent-Oriented Software Engineering 

Bussmann et al. present a brief review of the issues raised in the workshop’s 
panel on agent-oriented software engineering. 

Ciancarini et al. present a more detailed response to the issues raised in this 
panel. In particular, they highlight the advantages of adopting an approach based 
on coordination models. 

Sabater et al. present a logic-based approach for specifying and implementing 
intelligent agents. They exploit the notion of multi-context systems to develop 
modular specifications for a range of agent architectures. They go on to show 
how such specifications can be made operational through the development of an 
appropriate execution model. 

Busetta et al. describe an approach to agent design that involves the compo- 
sition of reusable modules called capabilities, which encapsulate related beliefs, 
events, and plans. Intentions are still posted to a global structure and agents re- 
tain control over which to pursue according to the circumstances. The approach 
is implemented within a Java-based PRS like framework called Jack. 

Graham and Decker discuss the internal architecture of the DECAF (Distri- 
buted Environment Centred Agent Framework) agent framework. DECAF is a 
toolkit that supports the design and development of intelligent agents. It pro- 
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vides a range of built-in facilities that agent designers can exploit in order to 
rapidly prototype their application. Such features include: communication, plan- 
ning, scheduling, execution monitoring and coordination. 



Section V: Decision Making in a Social Context 

Hogg and Jennings tackle the problem of an autonomous agent making decisi- 
ons in a social setting. They present a formal framework that enables the social 
implications of an agent’s decisions to be assessed. In particular, agents can dy- 
namically vary the degree to which they balance individual utility maximization 
and social welfare maximization. They then empirically evaluate a range of so- 
cial decision making functions and show which of them are successful in what 
sorts of domains. 

Boella et al. also address the problem of how to go about making local decisions 
that have positive benefit for a group of cooperating agents. In particular, they 
develop a decision theoretic approach to planning that deals with coordination 
as a team. 

Wagner and Lesser present a framework in which an agent’s organizational 
context can be explicitly represented and reasoned about. They show how an 
agent’s knowledge structures need to be extended to incorporate information 
about the organizational structure in which it is embedded and then indicate 
how an agent’s control regime needs to be modified to take this context into 
account. 

Brainov analyses the impact of an agent’s preferences on the types of inter- 
actions in which it engages. In particular, the assumption that self-interest is 
the optimal decision policy for an autonomous agent is challenged. Indeed, a 
number of instances where self-interested behavior leads to inefficient outcomes 
are presented. 

Castelfranchi et al. discuss the roles that norms can play in a society of auto- 
nomous agents. They present a set of principles that define how an agent should 
behave with respect to both norm adherence and norm violation. The embodi- 
ment of these principles within an agent architecture is then detailed and their 
impact upon an agent’s reasoning process is outlined. 
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Abstract. Although many formalisms have been proposed for reasoning about 
intelligent agents, few of these have been semantically grounded in a concrete 
computational model. This paper presents VSK logic, a formalism for reasoning 
about multi-agent systems, in which the semantics are grounded in an general, 
finite state machine-like model of agency. VSK logic allows us to represent: 
what is objectively true of the environment; what is visible , or knowcible about 
the environment; what the agent perceives of the environment; and finally, what 
the agent actually knows about the environment. VSK logic is an extension of 
modal epistemic logic. The possible relationships between what is true, visible, 
perceived, and known are discussed and characterised in terms of the architectural 
properties of agents that they represent. Some conclusions and issues are then 
discussed. 



1 Introduction 

Many formalisms have been proposed for reasoning about intelligent agents and multi- 
agent systems [16]. However, most such formalisms are ungrounded, in the sense that 
while they have a mathematically well-defined semantics, these semantics cannot be 
given a computational interpretation. This throws doubt on the claim that such logics 
can be useful for reasoning about computational agent systems. 

One formalism that does not fall prey to this problem is epistemic logic — the modal 
logic of knowledge [5]. Epistemic logic is computationally grounded in that it has a 
natural interpretation in terms of the states of computer processes. Epistemic logic can 
be seen as a tool with which to represent and reason about what is objectively true of 
a particular environment and the information that agents populating this environment 
have about it. 

Although epistemic logic has proved to be a powerful tool with which to reason about 
agents and multi-agent systems, it is not expressive enough to capture certain key aspects 
of agents and their environments. First, there is in general a distinction between what 
is instantaneously true of an environment and what is knowable or visible about it. To 
pick an extreme example, suppose p represents the fact that the temperature at the north 
pole of Mars is 200K. Now it may be that as we write, p is true of the physical world — 
but the laws of physics prevent us from having immediate access to this information. In 
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this example, something is true in the environment, but this information is inaccessible. 
Traditional epistemic logics can represent p itself, and also allow us to represent the fact 
that the agent does not know p. But there is no way of distinguishing in normal modal 
logic between information that is both true and accessible, and statements that are true 
but not accessible. Whether or not a property is accessible in some environment will 
have a significant effect on the design of agents to operate in that environment. 

In a similar way, we can distinguish between information that is accessible in an 
environment state, and the information an agent actually perceives of that environment 
state. For example, it may be that a particular fact is knowable about some environment, 
but that the agent’s sensors are not capable of perceiving this fact. Again, the relationship 
between what is knowable about an environment and what an agent actually perceives of 
it has an impact on agent design. Finally, we can also distinguish between the information 
that an agent’s sensors carry and the information that the agent actually carries in its 
state, i.e., its knowledge. 

In this paper, we present a formalism called VSIC logic, which allows us capture 
these distinctions. VSIC logic allows us to represent what is objectively true of the 
environment, information that is visible, or knowable about the environment, information 
the agent perceives of the environment, and finally, what the agent actually knows about 
the environment. VSIC logic is an extension of modal epistemic logic. The underlying 
semantic model is closely related to the interpreted systems model, which is widely used 
to give a semantics to modal epistemic logic [5, pp 103-1 14]. A key contribution of VSIC 
logic is that possible relationships between what is true, visible, perceived, and known 
are characterised model theoretically in terms of the architectural properties of agents 
that they correspond to. 

The remainder of this paper is structured as follows. First, the formal model that 
underpins VSIC logic is presented. In the sections that follow, the logic itself is developed, 
and some systems of VSIC logic are discussed. An example is presented, illustrating the 
formalism. Finally, related work is presented, along with some conclusions, and some 
open issues are briefly discussed. We begin, in the following section, by presenting the 
underlying semantic model. 



2 A Formal Model 

In this section, we present a simple formal model of agents and the environments they 
occupy — see Figure 1 (cf. [6, pp307-313]). We start by introducing the basic sets 
used in our formal model. First, it is assumed that the environment may be in any of a 
set E = {e, e' . . . .} of instantaneous states, and that the (single) agent occupying this 
environment may be in any of a set L = {l, /', . . .} of local states. Agents are assumed 
to have a repertoire of possible actions available to them, which transform the state 
of the environment — we let Ac = {a, a' . . . .} be the set of actions. We assume a 
distinguished member null of Ac, representing the "noop” action, which has no effect 
on the environment. 

In order to represent the effect that an agent’s actions have on an environment, we 
introduce a state transformer function, r : E x Ac — » E (cf. [5, pl54]). Thus r(e,a) 
denotes the environment state that would result from performing action a in environment 
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Fig. 1 . An overview of the framework. 



state e. Note that our environments are deterministic, there is no uncertainty about the 
result of performing an action in some state. Dropping this assumption is not problematic, 
but it does make the formalism somewhat more convoluted. 

In order to represent what is knowable about the environment, we use a visibility 
function , v : E — > ( p(E ) \ 0). The idea is that if the environment is actually in state 
e , then it is impossible for any agent in the environment to distinguish between e and 
any member of o(e). We require that v partitions E into mutually disjoint sets of states, 
and that e £ v(e), for all e £ E. For example, suppose ^{ef) — {^2,63, ef\. Then the 
intuition is that the agent would be unable to distinguish between e > and e.3, or between 
e2 and e 4 . Note that visibility functions are not intended to capture the everyday notion 
of visibility, as in “object x is visible to the agent”. 

We will say v is transparent if v(e) = {e}. Intuitively, if v is transparent, then it 
will be possible for an agent observing the environment to distinguish every different 
environment state. 

Formally, an environment Env is a 4-tuple (E. r, v, e 0 ), where £ is a set of environ- 
ment states as above, r is a state transformer function, v is a visibility function, and 
eo £ E is the initial state of Env. 

From Figure 1 , we can see that an agent has three functional components, representing 
its sensors (the function see), its next state function {next), and its action selection, or 
decision making function (do). Formally, the perception function see : p(E) — > P maps 
sets of environment states to percepts — we denote members of P by p, p' , . . .. The 
agent’s next state function next : L x P — » L maps an internal state and percept to an 
internal state; and the action-selection function do : L -A Ac simply maps internal states 
to actions. 

The behaviour of an agent can be summarised as follows. The agent starts in some 
state Iq. It then observes its environment state eo through the visibility function u(eo), 
and generates a percept see(^(eo)). The internal state of the agent is then updated to 
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next(l 0 , see(v(eo))). The action selected by the agent is then do(next(l 0 , see(v(eo)))). 
This action is performed, and the agent enters another cycle. 

Together, an environment/agent pair comprise a system. The global state of a system 
at any time is a pair containing the state of the agent and the state of the environment. 
Let G = E x L be the set of all such global states. We use g (with annotations: g, g ', . . .) 
to stand for members of G. A run of a system can be thought of as an infinite sequence: 

Q!0 CX. 2 CK 3 a u— 1 Ot-u 

go — > gi — > g 2 — > g 3 — > • ■ ■ — > gu — > • • • 

A sequence (go, gi, g 2 , ■ • •) over G represents a run of an agent (see, next, do, Iq) in an 
environment (E, r, v, eo) iff: 

1. go = (eo,next(lo,see(vis(eo)))) and; 

2. Vn e IN, if g u = (e, l) and g u+1 = (e’ , V) then 

e' = r(e,do(l)) and 
l’ = next(l, e’) 

Let Gznv.Ag C G denote the set of global states that system Env,Ag could enter during 
execution. 

In order to represent the properties of systems, we assume a set d> = [p. q,r , . . .} 
of primitive propositions. In order to interpret these propositions, we use a function 
7r : <P x GA g ,Env — > {T , F}. Thus n(p, g) indicates whether proposition p G d> is true (T) 
or false (F) in state g £ G. Note that members of <P are assumed to express properties of 
environment states only, and not the internal properties of agents. We also require that 
any two different states differ in the valuation of at least one primitive proposition. 

We refer to a triple (Env, Ag, 7 r) as a model — our models play the role of interpreted 
systems in knowledge theory [5, pi 10]. We use M (with annotations: M' ,M \, . . .) to 
stand for models. 

3 Truth and Visibility 

Now that we have the formal preliminaries in place, we can start to consider the rela- 
tionships that we discussed in section 1. We progressively introduce a logic C, which 
will enable us to represent first what is true of the environment, then what is visible, 
or knowable of the environment, then what an agent perceives of the environment, and 
finally, what it knows of the environment. 

We begin by introducing the propositional logic fragment of C. which allows us to 
represent what is true of the environment. Propositional formulae of C are built up from 
< P using the classical logic connectives “A” (and), “V” (or), “- 1 ” (not), “=>” (implies), and 
“4=>” (if, and only if), as well as logical constants for truth (“true”) and falsity (“false”). 
We define the syntax and semantics of the truth constant, disjunction, and negation, and 
assume the remaining connectives and constants are introduced as abbreviations in the 
conventional way. Formally, the syntax of the propositional fragment of C is defined by 
the following grammar: 



(wff) ::= true | any element of <P \ ->(wff) \ ( xvff ) V (wff) 
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The semantics are defined via the satisfaction relation “|=”: 

(M, g) (= true 

(M, g) |= p iff 7 r(p, g) = T (where p £ <£) 

(M, g) |= Hp iff not (M, g) f= p 

(M, g) |= p V ip iff (M, g) |= p or (M, g) |= ip 

We will assume the conventional definitions of satisfiability, validity, and validity in a 
model. 

We now enrich C by the addition of a unary modality “V”, which will allow us 
to represent the information that is instantaneously visible or knowable about an envi- 
ronment state. Thus suppose the formula Vp is true in some state g £ G. The intended 
interpretation of this formula is that the property p is knowable of the environment when 
it is in state g ; in other words, that an agent equipped with suitable sensory apparatus 
would be able to perceive the information p. If —>Vp were true in some state, then no 
agent, no matter how good its sensory apparatus was, would be able to perceive p. 

Note that our concept of visibility is distinct from the everyday notion of visibility 
as in “object o is visible to the agent”. If we were interested in capturing this notion of 
visibility we could use a first-order logic predicate along the lines of visible(x,y, o) to 
represent the fact that when an agent is in position (x, y) , object o is visible. The arguments 
to such visibility statements are terms , whereas the arguments to the visibility statement 
Vp is a proposition. 

In order to give a semantics to the V operator, we define a binary visibility accessibility 
relation ~„C Gxg.Env x GAg.Env as follows: (e, /) ( e',1 ') iff e' £ v(e). Since v 

partitions E, it is easy to see that is an equivalence relation. The semantic rule for 
the V modality is given in terms of the relation in the standard way for possible 
worlds semantics: (M, (e, /)) |= Vp iff ( M , ( e',l ')) |= p for all (e',/') £ GAg.Em such 
that ( e,l ) (<?', /'). As is an equivalence relation, the V modality has a logic of 

S5 [5], In other words, formula schemas (l)-(5) are valid in 



v{p => ip) ((Vp) =» (Vip)) 


( 1 ) 


Vp => ->V~>p 


( 2 ) 


Vp => p 


( 3 ) 


Vp => V(Vp) 


( 4 ) 


~A?p V^Vp 


( 5 ) 



We will omit the (by now standard) proof of this result — see, e.g., [5, pp58-59]. 

Formula schema (3) captures the first significant interaction between what is true 
and what is visible. However, we can also consider the converse of this implication: 

p => Vp ( 6 ) 

This schema says that if p is true of an environment, then p is knowable. We can cha- 
racterise this schema in terms of the environment’s visibility function: formula schema 
(6) is valid in a model iff the visibility function of that model is transparent. Thus in 
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transparent environments, visibility collapses to truth, since p <£=> Vp will be valid in 
such environments. In other words, everything true in a transparent environment is also 
visible, and vice versa. Note that we consider this a helpful property of environments 
— in the terminology of [13], such environments are accessible. Unfortunately, most 
environments do not enjoy this property. 



4 Visibility and Perception 

The fact that something is visible in an environment does not mean that an agent actually 
sees it. What an agent does see is determined by its sensors, which in our formal model 
are represented by the see function. In this section, we extend our logic by introducing 
a unary modal operator “«S”, which is intended to allow us to represent the information 
that an agent sees. The intuitive meaning of a formula Sip is thus that the agent perceives 
the information p. Note that, as with the V operator, the argument to S is a proposition, 
and not a term denoting an object. 

In order to define the semantics of S , we introduce a perception accessibility relation 
~ S C G Ag Em , x G Ag ,Env as follows: (e, l ) (e' , l') iff see(v(e)) = see(v(e')). That is, 

g g' iff the agent receives the same percept when the system is in state g as it does in 
state g' . Again, it is straightforward to see that is an equivalence relation. Note that, 
for any of our models, it turns out that C ~ s . 

The semantic rule for S is: (M, ( e , /)) |= Sip iff (M, (e ' , /')) (= ip for all (e' , l') € 
G A g,Env such that (e, l ) (<?', I'). As ~ s is an equivalence relation, S will also validate 

analogues of the S5 modal axioms KDT45: 



s{p => ip) => ((Sip) => (s VO) 


(7) 


Sip =>■ ~^S—iip 


(8) 


Sip => ip 


(9) 


Sp => S(Sip) 


(10) 


^Sip S^Sp 


(11) 



It is worth asking whether these schemas are appropriate for a logic of perception. If we 
were attempting to develop a logic of human perception, then an S5 logic would not be 
acceptable. Human perception is often faulty, for example, thus rejecting schema (9). 
We would almost certainly reject (11), for similar reasons. However, our interpretation 
of Sip is that the percept received by the agent carries the information ip. Under this 
interpretation, an S5 logic seems appropriate. 

We now turn to the relationship between V and S. Given two unary modal operators, 
□ i and 0 2 , the most important interactions between them can be summarised as follows: 

□i¥> ^ D 2<P (*) 

We use (*) as the basis of our investigation of the relationship between V and S. The 
most important interaction axiom says that if an agent sees ip, then ip must be visible. 
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It turns out that formula schema ( 12), which characterises this relationship, is valid — 
this follows from the fact that 



Sp => Vp (12) 

Turning to the converse direction, the next interaction says that if p is visible, then p is 
seen by the agent — in other words, the agent sees everything visible. 



Vp => Sp (13) 

Intuitively, this axiom characterises agents with “perfect” sensory apparatus, i.e., a see 
function that never loses information. Formally, we will say a perception function see 
is perfect iff it is an injection; otherwise we will say it is lossy. Lossy perception fun- 
ctions can map different visibility sets to the same percept, and hence, intuitively, lose 
information. It turns out that formula schema (13) is valid in a model if the perception 
function of that model is perfect. 



5 Perception and Knowledge 

We now extend our language £ by the addition of a unary modal operator 1C. The intuitive 
meaning of a formula ICp is that the agent knows p. In order to give a semantics to 
1C, we introduce a knowledge accessibility relation G A g,Env x G Ag ^nv in the by- 
now conventional way [5, pill]: (e, /) (e',1 1 ) iff l = l'- As with ~j, and ~ s , it is 

easy to see that is an equivalence relation. The semantic rule for 1C is as expected: 
( M,(e,l )) \= ICp iff (M , (e’ , l’)) f= for all (e' , l') € G AgtEnv such that (e, l) (e',/'}. 
Obviously, as with V and S, the K, modality validates analogues of the modal axioms 
KDT45. 



IC{p =>if)=> {{ICp) => (/CVO) 


(14) 


ICp => -^tC~ip 


(15) 


ICp => p 


(16) 


ICp => 1C {ICp) 


(17) 


—ilCp => IC-iTCp 


(18) 



We now turn to the relationship between what an agent perceives and what it knows. As 
with the relationship between S and V, the main interactions of interest are captured in 
(*). The first interaction we consider states that when an agent sees something, it knows 
it. 



Sp => ICp (19) 

Intuitively, this property will be true of an agent if its next state function distinguishes 
between every different percept received. If a next state function has this property, then 
intuitively, it never loses information from the percepts. We say a next state function 
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is complete if it distinguishes between every different percept. Formally, a next state 
function next is complete iff next(l,p) = next(l',p') implies p = p' . Formula schema 
(19) is valid in a model iff the next state function of that model is complete. 

Turning to the converse direction, we might expect the following schema to be valid: 

Kp => Sp (20) 

While this schema is satishable, it is not valid. To understand what kinds of agents 
validate this schema, imagine an agent with a next state function that chooses the next 
state solely on the basis of it current state. Let us say that an agent is local if it has this 
property. Formally, an agent’s next-state function is local iff next(l, p) = next (/' , p) for 
all local states /, l’ € /,, and percepts p £ P. It is not hard to see that formula schema (20) 
is valid in a model if the next state function of the agent in this model is local. 



6 Systems of VSK Logic 

The preceding sections identified the key interactions that may hold between what is 
true, visible, seen, and known. In this section, we consider systems of VSK logic, by 
which we mean possible combinations of interactions that could hold for any given 
agent-environment system. To illustrate, consider the class of systems in which: (i) the 
environment is not transparent; (ii) the agent’s perception function is perfect; and (iii) the 
agent’s next state function is neither complete nor local. In this class of models, the 
formula schemas (3), (12), and (13) are valid. These formula schemas can be understood 
as characterising a class of agent-environment systems — those in which the environment 
is not transparent, the agent’s perception function is perfect, and the agent’s next state 
function is neither complete nor local. In this way, by systematically considering the 
possible combinations of VSK formula schemas, we obtain a classification scheme for 
agent-environment systems. As the basis of this scheme, we consider only interaction 
schemas with the following form. 



^ D 2 V 

Given the three VSK modalities there are six such interaction schemas: (6), (3), (13), 
(12), (19), and (20). This in turn suggests there should be 64 distinct VSK systems. 
However, as (3) and (12) are valid in all VSK systems, there are in fact only 16 distinct 
systems, summarised in Table 1 . 

In systems VSK- 8 to VSK- 1 5 inclusive, visibility and truth are equivalent, in that 
everything true is also visible. These systems are characterised by transparent visibility 
relations. Formally, the schema p Vp is a valid formula in systems VSK-8 to VSK- 
15. The V modality is redundant in such systems. 

In systems VSK- 4 to VSK-1 and VSK- 12 to VSK - 1 5, everything visible is seen, 
and everything seen is visible. Visibility and perception are thus equivalent: the formula 
schema Vp -o- Sp is valid in such systems. Hence one of the modalities V or S is 
redundant in systems VSK- 4 to VSK-1 and VSK- 12 to V5/C-15. Models for these 
systems are characterised by agents with perfect perception (see) functions. 
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Table 1. The sixteen possible VSK systems. A cross (x ) indicates that the schema is valid in the 
corresponding system; all systems include (3) and (12). 









Formula Schemas 






System 


(6) 


(3) 


(13) 


(12) 


(19) 


(20) 


Name 


p => Vp 


Vp =A p 


Vp => Sp 


Sp => Vp 


Sp => Kp 


Kp => Sp 


VSK- 0 




X 




X 






VSK- 1 




X 




X 




X 


VSK- 2 




X 




X 


X 




VSK- 3 




X 




X 


X 


X 


VSK- 4 




X 


X 


X 






VSK- 5 




X 


X 


X 




X 


VSK- 6 




X 


X 


X 


X 




VSK-1 




X 


X 


X 


X 


X 


VSK - 8 


X 


X 




X 






VSK- 9 


X 


X 




X 




X 


VSK-10 


X 


X 




X 


X 




VSK- 11 


X 


X 




X 


X 


X 


VSK- 12 


X 


X 


X 


X 






V5A3-13 


X 


X 


X 


X 




X 


VSK- 14 


X 


X 


X 


X 


X 




V5A3-15 


X 


X 


X 


X 


X 


X 



In systems VSK-3, VSK-1 , V5/C-11, and VSK- 1 5, knowledge and perception are 
equivalent: an agent knows everything it sees, and sees everything it knows. In these 
systems, Sp *=> Kip is valid. Models of such systems are characterised by complete, 
local next state functions. 

In systems VSK- 12 to VSK- 1 5, we find that truth, visibility, and perception are 
equivalent: the schema ip <£=> Vp •<=> Sp is valid. In such systems, the V and S modalities 
are redundant. 

An analysis of individual VSK systems identifies a number of interesting properties, 
but space limitations prevents such an analysis here. We simply note that in system 
V5/C-15, the formula schema p -£=> Vp <J=> Sp •<=>■ Kp is valid, and hence all three 
modalities V, S, and K are redundant. System VSK - 1 5 thus collapses to propositional 
logic. 



7 Related Work 

Since the mid 1980s, Halpern and colleagues have used modal epistemic logic for rea- 
soning about multi-agent systems [5]. In this work, they demonstrated how interpreted 
systems could be used as models for such logics. Interpreted systems are very close to 
our agent-environment systems; the key differences are that they only record the state of 
agents within a system, and hence do not represent the percepts received by an agent or 
distinguish between what is true of an environment and what is visible of that environ- 
ment. Halpern and colleagues have established a range of significant results relating to 
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such logics, in particular, categorisations of the complexity of various decision problems 
in epistemic logic, the circumstances under which it is possible for a group of agents 
to achieve “common knowledge” about some fact, and most recently, the use of such 
logics for directly programming agents. Comparatively little effort has been devoted to 
characterising “architectural” properties of agents. The only obvious examples are the 
properties of no learning, perfect recall, and so on [5, pp281-307]. 

In their “situated automata” paradigm, Kaelbling and Rosenschein directly synthe- 
sised agents (in fact, digital circuits) from epistemic specifications of these agents [12]. 
While this work clearly highlighted the relationship between epistemic theories of agents 
and their realisation, it did not explicitly investigate axiomatic characterisations of ar- 
chitectural agent properties. Finally, recent work has considered knowledge-theoretic 
approaches to robotics [2], 

Many other formalisms for reasoning about intelligent agents and multi-agent sy- 
stems have been proposed over the past decade [16]. Following the pioneering work of 
Moore on the interaction between knowledge and action [9], most of these formalisms 
have attempted to characterise the “mental state” of agents engaged in various activities. 
Well-known examples of this work include Cohen-Levesque’s theory of intention [4], 
and the ongoing work of Rao-Georgeff on the belief-desire-intention (bdi) model of 
agency [ 10]. The emphasis in this work has been more on axiomatic characterisations of 
architectural properties; for example, in [11], Rao-Georgeff discuss how various axioms 
of bdi logic can be seen to intuitively correspond to properties of agent architectures. 
However, this work is specific to bdi architectures, and in addition, the correspondence 
is an intuitive one: they establish no formal correspondence, in the sense of VSIC logic. 

A number of author have considered the problem of reasoning about actions that may 
be performed in order to obtain information. Again building on the work of Moore [9], 
the goal of such work is typically to develop representations of sensing actions that can 
be used in planning algorithms [1], An example is [14], in which Scherl and Levesque 
develop a representation of sensing actions in the situation calculus [8]. These theories 
focus on giving an account of how the performance of a sensing action changes an 
agent’s knowledge state. Such theories are purely axiomatic in nature — no architectural, 
correspondence is established between axioms and models that they correspond to. 

Finally, it is worth noting that there is now a growing body of work addressing 
the abstract logical properties of multi-modal logics, of which VSIC is an example [3]. 
Lomuscio and Ryan, for example, investigate axiomatizations of multi-agent epistemic 
logic (epistemic logics with multiple K, operators) [7], The analysis in this paper can 
clearly benefit from such work. 



8 Conclusions 

In this paper, we have presented a formalism that allows us to represent several key 
aspects of the relationship between an agent and the environment in which it is situated. 
Specifically, it allows us to distinguish between what is true of an environment and what 
is visible, or knowable about it; what is visible of an environment and what an agent 
actually perceives of it; and what an agent perceives of an environment and actually 
knows of it. Previous formalisms do not permit us to make such distinctions. 
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For future work, a number of obvious issues present themselves: 

- Completeness. 

First, completeness results for the formalism would be desirable: multi-modal lo- 
gics are a burgeoning area of research, for which general completeness results are 
beginning to emerge. 

- Multi-agent extensions. 

Another issue is extending the formalism to the multi-agent domain. It would be 
interesting to investigate such interactions as KjVjp (agent i knows that ip is visible 
to agent./). 

- Temporal extensions. 

The emphasis in this work has been on classifying instananeous relationships in 
VSK logic. Much work remains to be done in considering the temporal extensions 
to the logic, in much the same way that epistemic logic is extended into the temporal 
dimension in [15]. 

- Knowledge-based programs. 

The relationship between VSK. logic and knowledge-based programs [5, Chapter 
7] would also be an interesting area of future work: VSK logic has something to 
say about when such programs are implementable. 
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Abstract. The logic S5„ is widely used as the logic of knowledge for ideal agents 
in a multi-agent system. Some extensions of S5 n have been proposed for expres- 
sing knowledge sharing between the agents, but no systematic exploration of the 
possibilities has taken place. In this paper we present a spectrum of degrees of 
knowledge sharing by examining and classifying axioms expressing the sharing. 
We present completeness results and a diagram showing the relations between 
some of the principal extensions of S52 and discuss their usefulness. The paper 
considers the case of a group of two agents of knowledge. 



1 Introduction 

The modal logic S5 n (see for example [18,17]), whose mono-modal fragment S5 was 
first proposed in [7] to represent knowledge, has been used to model knowledge in 
multi-agent systems (MAS) for some years now [4], The logic S5„ is a classical modal 
logic containing n modalities where i is in a set A of agents, expressing the private 
knowledge of agent i. Results that extend the logic S5„ to model group properties such 
as common knowledge and distributed knowledge within a group of agents are also well 
known ([4,17]). 

The logic S5„ models an ideal set of agents, in particular agents enjoy positive and 
negative introspection and their knowledge is closed under implication; in other words 
they are perfect reasoners. 

A peculiarity of the logic S5„, is that there is no a priori relationship between the 
knowledge of the various agents. In some applications, however, this might not be what 
is desired. For example, a central processing unit j of a collective map making ([3]) 
robotic MAS should be told of any knowledge acquired by any other agent. Therefore 
the agent j should know everything that is known by any agent. In the formal language 
of modal logic, under the usual assumptions of ideality, this scenario can be represented 
by S5 n enriched by the axiom: 

□,p => Ojp, for all i £ A. 

Another example of knowledge sharing between agents concerns a MAS whose agents 
(databases in this example) have computation capabilities that can be ordered. If the 
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agents are executing the same program on the same data then it is reasonable to model 
the MAS by enriching the logic S5„ by: 

□*P => Ojp; i j 

where -< expresses the order in the computational power at disposal of the agents. In 
these two cases, some information is being shared among the agents of the group. 

A third example of sharing in the literature is the axiom 

^i^jP = ^’ i 7 ^ j 

[2] which says that: if agent i considers possible that agent j knows p then agent j must 
know that agent i considers possible that p is the case. 

It is easy to imagine other meaningful axioms that express interactions between 
the agents in the system; clearly there is a spectrum of possible degrees of knowledge 
sharing. At one end of the spectrum is S5 n , with no sharing at all. At the other end, there 
is S5„ together with 

□ jp Ojp; for all i,j € A, 

saying that the agents have precisely the same knowledge (total sharing). The three 
examples mentioned above exist somewhere in the (partially ordered) spectrum between 
these two extremes. 

Some instances of such systems have already been identified in [2,1,15] and in other 
papers. Our aim in this paper is to explore the spectrum systematically. We restrict our 
attention to the case of two agents (i.e. to extensions of S52), and explore axiom schemas 
of the forms 

□p => Hp 

□p => □ □ p 

□ □ p =>■ 0p 

□ □ p =>■ □ □ p 

where each occurrence of □ is in the set {Oi, Di, O 2 , 11 ) 2 }. 

Technically we will prove correspondence properties and completeness for exten- 
sions of S52 with axioms of these forms. Naturally, this will not give the complete 
picture: there may be interesting axioms of other forms than those listed above. Howe- 
ver, analysis of the literature certainly suggests that most axioms studied for this purpose 
are of one of these forms. They are sufficient for expressing how knowledge and facts 
considered possible are related to each other up to a level of nesting of two, which is 
already significant for human intuition. Note also that the examples above are included 
in the axiom patterns. 

The rest of this paper is organised as follows. In Section 1.1 we fix the notation and 
recall two known results that we will extensively use in the following. In Section 2 we 
analyse and discuss the interaction axiom schema the form Hp => Hp. We will then 
move to Section 3 where we discuss the case of the consequent composed by two modal 
operators. In Section 4 we will analyse the interaction axioms resulting from two nested 
modalities both in the antecedent and in the consequent. Finally in Section 5 we present 
the spectrum of interaction axioms that is generated. 
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1.1 Preliminaries 

Our syntax is the standard bi-modal language C, defined from a set P of propositional 
variables: 

f ::= p | ->4> | (j>i A (f>2 | □*</> 

where p £ P,i £ {1, 2}. 

As standard, we use bi-modal Kripke frames F = ( W , R \ . Ip) and models M = 
( W, i?i , f ?2 , 7r) ([ 1 3] ) to interpret the language C. Interpretation, satisfaction and validity 
are defined as standard (see for example [10]). 

The paper is devoted to extensions of S52, which is defined by the following Hilbert 
style axioms and inference rules. 

Taut I — S 5 2 t> where t is any propositional tautology 
K I S5 2 °i(p => <z) => \ U iP => 0*9) 

T I S5 2 n iP =>■ P 

4 l“S5 2 □*?> => OjO*P 

5 I - S5 2 ^ iP => OjOiP 

US If I S5 2 </>, then b S52 <f>[ipi/pi, ■ ■ ■ , 4> n /Pn\ 

MP If b S 5 2 (t> and b S52 <t>=*i>, then b S 5 2 ^ 

Nee If b S 5 2 then b S 5 2 D *</> 

In the above the index i is in {1, 2}. 

The symbol b means provability in that logic, or in the extension under consideration. 
By S52 + {</>} we denote the extension of S52 in which the formula <f> is added to the 
axioms. 

The following is also widely known. 

Theorem 1. The logic S5 2 is sound and complete with respect to equivalence frames 
F={W,~ u ~ 2 ). 

We will always be working in the class Tg of equivalence frames. 

We also recall a standard lemma that we will use in this paper. 

Lemma 1. For any <j) £ C, we have b <^> <f=> OjlUj^ and b O if •<=>■ 

OjOj(/) where i £ A. 



2 Interaction Axioms of the Form \Bp => E\p 

We start with extensions of S52 with respect to interaction axioms that can be expressed 
as: 

□ f =>■ E\f, where □ £ {Di, Oi, <> 2 }. (1) 

There are 16 axioms of this form; factoring 1-2 symmetries reduces this number to 8, 
of which 4 are already consequences of S52 and therefore do not generate proper exten- 
sions 1 . The remaining 4 are proper extensions of S52 and give rise to correspondence 
properties as described in Fig. 1. 

1 The four are dip => dip, d 1 p => Oip, djp =>■ 0 2 p, and Oip =t- Oip. 
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Interaction Axioms 


Completeness 


□ip => a 2 p 




Oip => Dip 


^1 = idw 


Oip =>■ n 2 p 


~i=~ 2 = idw 


Oip =*• <> 2 p 





Fig. 1 . Proper extensions of S52 generated by axioms of the form □</> =>■ □(/). Formulas not 
included in the table but which are instances of this schema give completeness with respect to 
equivalence frames. 



Theorem 2. An equivalence frame F validates one of the axioms in Figure 1 if and only 
if F has the corresponding property. 



Theorem 3. All the logics S5 2 + {</>}, where f is a conjunction of formulas expressible 
from axiom schema 1 are sound and complete with respect to the intersection of the 
respective class of frames reported in Fig. 1. 

The results of Theorem 3 are quite well known. The most important logic is probably 
the one that forces the knowledge of an agent to be a subset of the knowledge of another. In 
Section 1 we have discussed two scenarios in which this can be proven useful. Stronger 
logics can be defined by assuming that the modal component for one of the agents 
collapses onto the propositional calculus. When this happens we are in a situation in 
which “being possible according to one agent” is equivalent to “being known” and this 
in turn is equivalent to “being true”. It is clear that this is indeed a very strong constraint 
which limits the expressivity of our language. Still these logics can be proven to be 
consistent. 

The strongest consistent logic is Triv 2 2 that can be defined from S52 by adding the 
axiomOip=> IH2ptoS52 or equivalently by adding both Oip => □ ipand<> 2 p=>' 0 2 p. 
In this logic the two agents have equal knowledge that is equivalent to the truth on the 
world of evaluation. 



3 Interaction Axioms of the Form \Hp => □ □ p 

There are 64 axioms of the shape 

□ 0 =>□□ 0 where □ G {Di, ^ 2 , Or, O 2 }. (2) 

Factoring 1-2 symmetries reduces this number to 32. Again, many of these (14 in number) 
do not generate proper extensions of S5 2 3 ■ For the remaining 18, the completeness results 
for the extension they generate are more complicated than the ones in the previous section. 

2 Triv2 is the logic obtained from S52 by adding Dip -*=> p and 0 2 p -*=> p. 

3 Dip =*• OiDip, Dip O1O1P, Dip => O1O2 p, Dip => IHi Dip, Dip =$■ DiOip, Dip => 
n i<>2P, Dip =>■ 0 2 Dip, Dip =$■ O2O1P, Dip =>■ 0 2 0 2 p, Dip => 0 2 <> 2 p, <> 2 p =>■ O1O2 p, 
O2 p => O2O1P, O2P =>■ O2O2P, and O2P => n 2 0 2 p. 
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We present one result in detail (the reader is referred to [14] for the other proofs): it 
concerns the axiom 

□ 1 p => Oi □273- 



Lemma 2. Let F be an equivalence frame. F |= dip => Oi 0- 2 p if and only ifF is such 
that\/w3w' £ [tc]~i : [u/]~ 2 C [w],^. 

([•u;] j is the ~i -equivalence class of w.) 

Proof. From right to left; consider any model M and a point w in it such that M |= w □ ip. 
So, for every point w' such that w ~i w' we have M [= w > p. But, by assumption, there 
exists a point w ' £ such that [u/p 2 C [w]^. So, p holds at any point of the 

equivalence class [u/]^ 2 , and so M \= w > 0 2 p. Therefore M \= w OiD 2 p. 

For the converse, suppose the relational property above does not hold. Then there 
exists a frame F and a point w in F such that for any w' £ [tn]^ wehave [io , ]^ 2 % [w]^, 
i.e. we have the existence of a point w" £ [ti/]^ 2 such that w" ^ [ioPj . Consider 
a valuation n such that n (p) = {w 1 \ w ~i w'}. We have (F,tv) |=„, Hpp and 
(F, 7r) Y= w n p. So (F, 7 r) \L W > D 2 p. So we have (F, n) \f= w 0]U 2 p which is absurd. 



Lemma 3. The logic S5 2 + (Di p => C>i0 2 p} is sound and complete with respect to 
equivalence frames satisfying the property Mw3w' £ [w]^ : [tt/]~ 2 C 

Proof. Soundness was proven in first part of Lemma 2. 

For completeness we prove that the logic S52 + {pi p => Oj 0 2 p\ is canonical. In 
order to do that, suppose, by contradiction, that the frame of the canonical model does 
not satisfy the relational property above. Then, it must be that there exists a point w such 
that: 

Vu/ £ [wPj 3 w" : w' ~2 w" and w w" . 

Call w[, . . . , w ' n , ... the points in [tn]^ 1 , and w" the point in [w’ j \^ 2 such that w w"; 
i = 1, . . . ,n, . . .. Recall (for example see [10], page 118) that w ~i w' on the ca- 
nonical model is defined as Vet £ £(□,:« £ w implies a £ w'); w fj w' is de- 
fined as 3a £ £(DjCt £ w and —>a £ w'). So we can find some formulas a, £ 

C\i = 1, ..., n, ... such that Diaj £ w,an £ £ w'/;i = 1 ... 

Call a = A-L 1 ctj ; we have U^on £ w;i = 1 ,n, So U^a £ w. But ->a £ 

w", i = 1, . . . ,n, — So 0 2 a £ w[ for every i in {1, . . . , n, . . .}. So □i0 2 ->a £ w, i.e. 
— lOiC^ct £ w. But Di a £ w and h Diet => OiU 2 a, so w would be inconsistent. The- 
refore the canonical frame must satisfy the property above and the logic is complete with 
respect to equivalence frames satisfying the property \/w3w’ £ [tu]^ : [w'] 2 C [w],^ . 

Similar results hold for the other 17 axioms of the form 2, and the situation is 
summarised in Fig. 2. See [14] for full details. 

Theorem 4. All the logics S5 2 + {</>}, where <j> is a conjunction of formulas expressible 
from axiom schema 2 are sound and complete with respect to the intersection of the 
respective class of frames reported in Fig. 2. 
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Interaction Axioms 


Completeness 


□ 1 p =>■ OiD 2 p 


VioBu/ € H~i : [w , ]~2 ^ [w]~i 


□ 1 p =>■ L^Dip 


r^ 2 C^l 


□ 1 p =>■ 0 2 0 2 p 


r^ 2 C^l 


□ 1 p =$■ D 2 Oip 


r^ 2 C^l 


□ 1 p =>■ Dll A 2 p 


r^ 2 C^l 


□ 1 p => 0 2 n 2 p 


r^ 2 C^l 


Oi p =>■ OiDip 


^ 1 = idw 


Oi p => OiD 2 p 


~ 2 — idw 


<>ip =>■ DiDip 


<^ 1 = idw 


Oi p => o 1 o 2 p 


~i=~ 2 = idw 


Oi p =>■ Di 0 2 p 




Oi p =>■ 0 2 Dip 


= idw 


Oi p =>■ 0 2 n 2 p 


~i=~ 2 = idw 


Oip => O 2 0 2 p 




Oi p => n 2 Dip 


~ 2 =~i= idw 


Oi p => a 2 a 2 p 


~i=~ 2 = idw 


Oi p =>■ D 2 Oip 


r^ 2 C^l 


Oi p =>■ n 2 o 2 p 





Fig. 2. Proper extensions of S52 generated by axioms of the form □</> =>■ □ □ <f>. Formulas not 
included in the table but which are instances of the schema give completeness with respect to 
equivalence frames. 



Among all these axioms, the most intuitive ones in terms of knowledge are probably 
□ ip =£■ U 2 Uip and its "dual” 0 2 p => □] □■ 2 P, in which one agent knows that the other 
knows something every time this happens to be the case. It is interesting to see that this 
is equivalent to one agent knowing everything known by the other agent. 

A more subtle, independent axiom expressed by axiom schema 2 is the formula 4 : 



□ ip =>■ O 1 CI 2 P, 



which reads "If agent 1 knows p, then he considers possible that agent 2 also knows p”. 
The above is an axiom that regulates a natural kind of “prudence” assumption of agent 
1 in terms of what knowledge agent 2 may have. This is meaningful in MAS in which 
agents have similar characteristics. In these scenarios when an agent knows a fact, it may 
be appropriate to assume that the other agent, by acquiring the same information from 
the environment and by following her same reasoning, could have reached the same 
conclusion. Note that very often humans act as if they followed this axiom. 

We leave it to the reader to explore other interactions from the table above. 

Note that by taking the contrapositive of axiom schema 2 we can express axioms of 
the form □ Hp =>■ Hp. So all those axioms are also covered in this section. For simplicity 
we do not report the case of antecedents indexed as 2, but by applying symmetry it is 
straightforward to generate the corresponding axioms. 

4 The technical details of this formula have been discussed in Lemma 2 and Lemma 3. 
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4 Interaction Axioms of the Form □ □ p =>- □ □ p 

We now discuss the most complex class of axioms we will see in this paper, i.e, extensions 
of S52 with interaction axioms expressible as: 

□ E\(j> => □ □ <j>, where □ G { □ i , Oi, O 2 }. (3) 

Of the 256 such axioms, we lose half by 1-2 symmetry; of the remaining 128, 64 of 
them begin with with i = j, which, by well known S5 equivalences (Lemma 1) 
collapse to a case of the previous section. The remaining 64 axioms divide into 26 which 
do not induce proper extensions of S52 5 and 38 axioms which do. Figure 3 summarises 
the result for the proper extensions. 

Some results for axioms of the form of axiom schema 3 are already available in the 
literature: 

Lemma 4 ([2]). The logic S5 2 + {OilU 2 p =>■ n 2 0 ip} is sound and complete with 
respect to frames satisfying the property: for all w,w\,w 2 G W such that w ~i w 1 , 
w ~2 w 2 there exists a point w such that w\ ~2 w, w 2 ~i w. 

Figure 3 shows that many of the extensions are equivalent to some logic examined 
in the previous sections. For example, we have the following. 

Lemma 5. The logic S5 2 + {Oi n 2 p =>■ □ 1 D 2 p} is sound and complete with respect to 
equivalence frames such that ^iC~ 2 - 

Proof. We show that the logic S5 2 + {OilH 2 p => □ 1 □ 2 ^} is equivalent to the logic 
S52 + {^ 2 p => Lit p}. In order to see this, we prove that: 

S52+{OiD2p=^OiQ2p} ^ 2 p Dip and < 0’i\H 2 p 1— 1 1 CH 2 p, 

where S5 2 +0 is the logic (closed under uniform substitution) obtained from S5 2 by 
adding the formula f. 

From left to right. Suppose OiD 2 p => □ 1 D 2 p. We have Oip => Oi<> 2 p by T. 
But since, by contraposition of the hypothesis, we have <>i<> 2 p => Ll 1 0 2 p, we obtain 
Oi p => DrOap, which in turn implies 0\p => O 2 p. 

From right to left. Suppose 0\p => 0 2 p and substitute 0 2 p for p in it. We obtain 
OiD 2 p => 0 2 m 2 P, which is equivalent to OiD 2 p => m 2 p. Now, by necessitating by 
□ 1 and distributing the box by using axiom K, we obtain diOidaP => □ 1 D 2 p, which, 
given Lemma 1 gives us to the result OilH 2 p => D-| Q 2 p. 

Since each of the two formulas above can be proven from the other within S52, we 
have that any proof of a formula in one logic can be repeated in the other. Now, since 
S52 + {Oip => <> 2 p} is complete (see Figure 1) with respect to equivalence frames such 
that ~iC~ 2 , then also S52 + {OiD 2 p => Dr n 2 p} is sound and complete with respect 
to the same class of frames. 

5 DiD2 p => OiDip, did 2 p => Oid 2 p, 0\0 2 p => O 1 O 1 p, did 2 p => Oi 0 2 p, did2 p => 

□ ldlP, dld 2 p => □ 1 n 2 p, dl 0 2 p =>■ DiOiP, Oid2P => □1O2P, □ld 2 p => <>2dip, 

□ id 2 p =£■ O 2 GI 2 P, dld2 P =► ^2^>lP, dld2 P =► 0 2 0 2 p, Ol d 2 p =► 0 2 0 2 p, □id 2 p =>■ 

□ 2O1P, 0\0 2 p =>■ d2<>2P, Oi d 2 p => O1CI2 p, OiD 2 p => O1O1P, 0\0 2 p =>■ O1O2P, 
Oi 0 2 p => diOlp, Old 2P => O2O1P, O1O2P => O1O2P, dl<>2P => O1O2P, dj<>2P =>■ 

□ 1O2P, di<>2P => O2O1 p, Ox<> 2 p =£■ O2O2P and di<>2P => d 2 0 2 p. 
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Interaction Axioms 


Completeness 


□ 1O2 p => DiOip 


Mw 3 w' £ [w]~i : [u /]~ 2 C 


□ 1O2 p => OiOip 


Vitf 3 «/ £ [wj^j : [«/]~ 2 C 


□ 1O2 p =A O1CI2 p 


?Viw 3 tt/ £ [wj^j : [«/]~ 2 = {«/} 


O1O2 P => 02^1 p 


rsj i = idw 


OiD 2 p => DiDip 


^1^2 


OiD 2 p => OiDip 


^1^2 


OiD 2 p =A diCbp 


^1^2 


OiD 2 p => FI1O2P 


^1^2 


OiD 2 p => □2FI1P 


^1^2 


O1O2P =>• U 2 ^ 2 P 


^1^2 


OlH 2 p =>• O2O2P 


^1^2 


OlD 2 p => <> 2 a iP 


^1^2 


OiD 2 P =>• 02 ^ 2 P 


^1^2 


OiD 2 p O2O2P 


^1^2 


O1O2 p =>■ DiOaP 


^1^2 


O1O2P =>• O2O2P 


^1^2 


O1O2P => O2O2 p 


^1^2 


O1O2P => DiDip 


~i=~2= idw 


O1O2P => DiDap 


~i=~2= idw 


O1O2P => DaOip 


~i=~2= idw 


O1O2P => n 2 n 2 P 


~i=~2= idw 


O1O2P =>■ OiOip 


~i=~2= idw 


O1O2P =>• O2D2 P 


~1=~2= idw 


□ 1O2P =A DiDip 


rs - , 2 = idw 


□ 1O2P => Or Dip 


rs - , 2 = idw 


□ 1O2P =A CI2O1P 


r^ 2 C^l 


O1O2P =*> DiOip 


^ 2 C^i 


O1O2P => DaOip 


rs^ 2 C^i 


O1O2P =>■ OiOip 


rs^ 2 C~l 


□ 1O2P => diCbp 


~2 = 


□ 1O2P => Cbdip 


~2= idw 


□ 1O2P => D 2 ^ 2 P 


r ^ j 2 = idw 


□ 1O2P =>■ 02 ^ 2 ^ 


~2= idw 


O1O2P =>■ O1O2P 


'■'-'2= idw 


□ lD 2 p => D2FI1P 


w ~i wi, w ~ 2 W2 ^ 3 wJ : 

Wl ~2 1^2 w 


O1O2P =A O2O1P 


w i^i, w ~ 2 W2 => : 

UJl ~2 tn, W 2 


OiD 2 p => n 2 Oip 


W Wi, W ~2 W 2 => 3 w : 

U >1 ~2 tU, W 2 w 


□ 1O2P => O2CI1 p 


? Either ~i= idw or ~2= idw 



Fig. 3 . Proper extensions of S 52 generated by axioms of the form □□ </>=>□□ </). For axioms 
listed with “?” correspondence is proved but completeness is only conjectured. 
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Axioms of shape 3 are intrinsically much harder (from a model-theoretic point of 
view) to examine with the basic tools than any other examined so far because they can 
express antecedents of the form IHi0 2 . These axioms represent knowledge of agent 1 
about facts considered possible by agent 2. Technically, these formulas reminds us of the 
the McKinsey axiom of mono-modal logic, which has represented a challenging problem 
for logicians for many years and has been solved not too long ago by Goldblatt [5], 

Consider axiom □ 1 0 2 p => Oi0 2 p. With this axiom we rule out situations in which 
agent 1 knows that p is considered possible by agent 2 and agent 1 also knows that ->p 
is considered possible by agent 2. 

Definition 1. A point w £ W is called an i-dead-end if for all w' £ W we have w w' 
implies w = w' . 

Lemma 6. Given a frame F = (W, ~i, ~ 2 ) and a point w on it, w is an i-dead-end if 
any only if for any valuation it, we have ( F , 7 r) \= w p □ j p. 

We can then prove the results for this axiom. 

Lemma 7. F \= □]_<> 2 p => O id 2 p if and only if F is such that every point w is related 
by relation 1 to a 2-dead-end; i.e. for all w £ W there exists a w' £ W, w ~i w' such 
that K]^ 2 = {«/}. 

Proof From right to left; consider any model M such that every point sees via 1 a 
2-dead-end. Suppose M |= TO □-[ 0 2 p; so for every point w' such that w w' we have 
that there must be a w" such that w' ~ 2 w" and M \= w " p. But by assumption one 
of the w' is a 2-dead-end, so we have the existence of a point w £ [w]^ such that (by 
Lemma 6) M |= w □ 2 p. Then M [=„, OiD 2 p. 

For the converse, consider any equivalence frame F, such that F |= Di0 2 p => 
Oi n 2 p and suppose by contradiction that the property above does not hold. Consider 
the set X = [w]^, the equivalence relation ~ = ~i (T ~ 2 and the quotient set X/^. 
Consider now the set Y constructed by taking one and only one representative w for 
each class [w]^ in X/^. Consider a valuation ir(p) = Y and consider the model M = 
(IT 7 , ~i, ~ 2 , 7t). By construction we have M (=,„ □i0 2 p. Then by our assumption we 
also have M OiD 2 p. So there must be a point w' such that w w' such that 
M | = w i U 2 p. But since w' by assumption is not a 2-dead-end, the equivalence class 
[w/]^ 2 must contain more than w' itself and by construction p is true only at one point 
in that class and false for every y ^ X. So we have M Y= w i U 2 p for every w' £ 
and so M \f= w 0\ D 2 p, which is absurd. So for every point w £ W there must be a 
2-dead-end accessible from it. 

Completeness for the above remains an open problem. 

Conjecture 1. The logic S5 2 + {□ 1 0 2 p => OjO 2 p} is sound and complete with respect 
to equivalence frames such that every point is related by relation 1 to a 2-dead-end; i.e. 
for all w £ W there exists a w' £ W, w ~i w' such that [w / ]^, 2 = {«/}. 

The same happens for the axiom U\i<> 2 p => 0 2 Di p. This axiom represents the 
situation in which it cannot be that agent 1 knows that agent 2 considers possible p while 
agent 2 knows that agent 1 considers possible -i p. 
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Lemma 8. F |= □ 1 0 2 p => 0 2 Di p if and only F is such that if in every connected 
sub-frame either ~i= idw or ~ 2 = idw- 

Proof From left to right. This part of the proof is structured as follows: 

1. We prove that F |= O 1 O 2 P =$■ 0 2 Di p implies that any point w £ W either sees 
via 1 a 2-dead-end, or the point w sees via 2 a 1-dead-end. 

2. We prove that if on a frame F such that F (= □ 1 0 2 p => 0 2 □ i p and there is point w 
which is an i-dead-end, then ~j= idw on the whole connected sub-frame generated 
by w; where i £ {1, 2}. 

3. The two facts above together prove that if F |= Di<> 2 p => 0 2 Dip, then in every 
connected sub-frame either ~i= idw or ~ 2 = idw- 

1) By contradiction, consider any connected equivalence frame F, in which aw £ W 
does not see via i any j-dead end, i.e. Vi// £ [to]^ 4 , f {u/}, i j, i, j £ {1, 2}; 

we prove that F □ 1 0 2 p=> 0 2 Oip. To see this, consider the set X = U [w]~ 2 \ 
{«;}, the equivalence relation ~ = ~i (T ~ 2 and the quotient set X / ^ . Consider now the 

set Y defined by taking one representative y for every equivalence class [z/] £ X / ^ : the 

set Y is such that Vt/i , y 2 £ Y we have [j/i] IT [y 2 ]~ = 0 and [j/]~ = 2C Consider 

now the model M = (F, n), by taking the valuation tt(p) = Y. By construction, in the 
model M for any x £ X, there is a point accessible from x via ~ 2 which satisfies p, and 
since by hypothesis w is neither a 1 -dead-end nor a 2-dead-end (as otherwise it would 
see itself as dead-end) we have M |= TO □ 1 0 2 p. So by the validity of the axiom we also 
have M |= w 0 2 D-|p, i.e. there must be a w' £ [u>]^ 2 , such that M \= w > □ -| p, but this is 
impossible because by hypothesis [u/]^ {w/}, and by construction p is true at just 

one point in [w/]^ (T [w/]^ 2 , and false at every point not in X. See Figure 4. 

2) Consider now a connected frame F such that F |= D-| 0 2 p => 0 2 Di p and 
suppose for example that w is a 1 -dead-end, we want to prove that ~i= idw on the 
connected sub-frame generated by w 6 . If w is also a 2-dead-end, then ~i=~ 2 = idw 
on the generated frame which gives us the result. If not, suppose that idw', so 
there must be two points w',w" £ W\w' w" , such that w' ~i w". So, since 
the frame is connected, without loss of generality assume w ~ 2 w' ■ Consider now 
valuation n(p) = {x \ x £ [w\^ 2 ,x w'} U {«/'} and the model M = (F,n) built 

on F from n. So, we have M 1=^ □ 2 0 1 p, and so, by validity of the axiom, we also 
have M [=„, Oi m 2 p. So we must have M |=„, 0 2 p, which is a contradiction because 
M \= w > ->p. 

So we have that if the axiom is valid, then in every connected component one of the 
two relations is the identity. 

From right to left. Consider any equivalence model M whose underlying frame 
satisfies the property above and suppose that M \= w □ 1 0 2 p. 

Suppose ~i= idw and M |=u> n i^ 2 P, so there is an/ £ [to]^ 2 , such that M \= w * p. 
But since ~i= idw on the connected part, we also have M \ = w i Di p. So M \= w , 
<> 2^1 p. Suppose now ~ 2 = idw and M |=„, 0 1 0 2 p- So for every w' £ [w]^ we have 
M \= w f p. But then we also have M [=„, 0 2 Dip. 

Again we can only conjecture completeness with respect to the above class of fra- 
mes. 

6 If w is a 2-dead-end then the argument is symmetric. 
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w]~j_ U [w ]^ 2 




Fig. 4 .A model not satisfying the property of Lemma 8. Note that [w],^ ^ {«>} and [ui]„ 2 ^ {w}. 



Conjecture 2. The logic S52 + {O 1 O 2 P => ^ 2 a iP} is sound and complete with respect 
to equivalence frames such that either ~i= idw or ~ 2 = idw on every connected sub- 
frame. 



5 Conclusions 

We have identified a number of non-trivial single-axiom extensions of S52 which specify 
a mode of interaction between two agents, and proved correspondence, soundness and 
completeness with respect to the appropriate classes of frames. The main contribution 
of this paper lies in the identification of a spectrum of interactions above 852- 

Figure 5 represents graphically all the logics discussed so far together with the 
corresponding semantic classes (the ones for which we only conjectured completeness 
are not included). In the figure, the logics are ordered strength- wise. So, the strongest 
logic is of course Triv 2 (represented as S52 + {Oip => 0 2 p}), the weakest simply 852- 
In between we have a few logic systems, the weakest of which are Catach’s logic, and 
the two axioms that we examined in Theorem 2 and Theorem 3. Note that these three 
logics are independent. Stronger extensions include logics in which the knowledge of 
an agent is included in the knowledge of the other and combination of these. 

The fairly exhaustive analysis carried out in this paper permits the Al-user with an 
interaction axiom in mind to refer to the above tables to identify the class of Kripke 
frames that gives completeness. For most of the logics above decidability also follows 
because most of them have the finite model property. 

We have given conjectures about the two McKinsey axioms (Conjecture 1, Con- 
jecture 2). In private correspondence Van der Hoek [9] has communicated a proof of 
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S 5 2 + {OiR =>■ □ 2 P } 





Fig. 5. The independent extensions of S52 that can be obtained by adding axioms of the shapes 
□0 => □</> and □</> =>□□</> and the corresponding classes of frames. Formulas not included 
in the table but obtainable from the schema above give completeness with respect to equivalence 
frames. The logics for which results are only conjectured are not included in the figure. 

Conjecture 1 , but the other axiom is still an open problem at this stage. Solving this issue 
is part of our future work. 

The results presented in this paper conceptually belong to a family of works in which 
the relation between different modalities in a single- or multi-agent setting is explored. 
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Among the many in the literature that deal with the interaction between different internal 
mental states, we would like to cite [8] in which van der Hoek, building upon a previous 
work [ 12] by Kraus and Lehmann, extensively explores the relation between knowledge 
and belief. Turning to the relation between agent and environment, in this proceedings 
Wooldridge and Lomuscio capture the relation between visibility, perception, and kno- 
wledge by means of formal tools very similar to ones presented in these pages [19]. 
Although the logics presented in those works aim at capturing static properties, an inte- 
resting line of research concerns designing algorithms defined on Kripke structures that 
model the evolution of internal mental states [6,16,1 1]. 
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Abstract. Some agent architectures employ mental states such as belief, desire, 
goal, and intention. We also know that one often has a belief about someone 
else’s belief (nested belief), and one’s action is decided based on the nested belief. 
However, to the best of our knowledge, there is no concrete agent architecture 
that employs nested beliefs for decision. The reason is simple: we do not have a 
good model of nested belief change. Hence, interesting technological questions 
are whether such a model can be devised or not, how it can be implemented, and 
how it can be used. In a previous paper, we proposed an algorithm for nested beliefs 
based on observability and logically characterized its output. Here, we propose 
another algorithm with improved expressiveness and efficiency. 



1 Introduction 

Some agent architectures [29,26] employ mental states such as belief, desire, goal, and 
intention. One’s static belief is often formalized by using a set of possible worlds in modal 
logic. B a (f) indicates agent a believes <j>, and is true if and only if f holds in any world 
the agent takes into account. Theoretically, it is possible to extend such logic to represent 
dynamic belief change [24,34], However, previous studies [32,19,30,14,16,9,10] do not 
explicitly specify how one$ belief about someone else$ belief (nested beliefs) should 
change. 

In addition, transforming the computation of nested belief into the validity-checking 
problem of modal logic is discouraging, because validity checking is believed to be 
computationally intractable [7] even if we disregard belief change. This situation has 
deterred researchers from employing nested belief as a key component of their agent 
architectures. 

However, we know that one often has a belief about someone else’s belief and it 
affects one’s decision. If we can find a good model of dynamic nested belief, we expect 
to be able to present a better agent-oriented architecture that is significantly different 
from object-oriented architectures. 

Nested belief is expected to be reliable and useful when agents share experiences, 
because agents can guess one anothers’ beliefs based on such shared experiences. Then 
communication among agents will be more efficient by using nested beliefs because 
they can skip explanation of different things. Although formalization of one’s own belief 
change is still controversial, we can find a lot of working systems that can consistently 
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update or revise their own beliefs (i.e., databases) for restricted domains. Thus, we 
can build practical systems even if we do not completely solve controversial problems. 
Therefore, we started our study with the aim of providing an easy-to-use tractable subset 
of human nested beliefs. 

We found literature on nested belief in natural language processing [24,1,17,2], 
These studies are interested in expressiveness in order to translate arbitrary English 
sentences to logical constraints and vice versa. Hence, they assumed the existence 
of a powerful theorem prover to process ambiguous statements. While the literature 
can provide guidelines for our study, however, for a number of reasons, it does not 
totally satisfy our needs. First, artificial agents use a simpler, less ambiguous artificial 
communication language. Second, an agent is not a logically perfect reasoner. System 
designers can presuppose the limitation of an agent’s intelligence. Third, agents can 
often send queries to other agents if necessary. The benefits of nested belief in artificial 
agents depend on the efficiency of the computation and the reliability of the guess versus 
the cost and delay of communication. 

Recently, we proposed a regressive algorithm for nested belief based on persistence 
and obsewability [12]. The algorithm empirically returns justifiable guesses with 
relatively low computational complexity. However, oversimplification causes salient 
problems and limits its applicability. This paper shows how we can solve the algorithm’s 
problems without losing its strong points. We also present a progressive algorithm that 
omits unnecessary computation. The algorithm also covers the solutions. 



1.1 Previous work: a regressive algorithm for nested belief 

Our previous algorithm presupposes the following multiagent environment. 

1 . The environment can be expressed by a set of persistent propositional variables. That 
is, once p becomes true or false, its truth- value will not change without interruption 
by an event. 

2. Time t is discrete, and is represented by integers: 0, 1, 2, An instantaneous event 

may occur between time f — 1 and t. No two events occur simultaneously. The event 
is denoted e*. 

3. Agents execute different actions (events). Agents observe that some events occur 
and some propositional variables hold. 

4. Agents have common knowledge about the observability and direct effects of events. 
The observability of an event or a proposition is expressed by a condition that enables 
one to observe it. 

5. Agents simply accept observed data, and update their beliefs to keep consistency by 
removing incompatible beliefs. They do not think of unobserved past events. 

We do not take into consideration the communication language the agents use. The 
expressiveness of the language affects the computational complexity of nested belief. If 
the language contains only assertions of one’s belief about primitive propositions, it is 
rather easy to modify the algorithm for the assertion. Van Linder et al. [34] discussed 
how one should revise one’s own belief by observation, communication, and default 
reasoning. Anyway, the communication language is beyond the scope of this paper. 
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If an agent a 0 wants to use the algorithm, it has to provide an initial belief, an 
observed event e 4 , and a set of observed propositions Ob(t) for each t = 1 , . . . , m. 
Then the algorithm ascertains whether a o should believe that ai believes that . . . that 
a k believes p (i.e,, B ao B ai ■ ■ ■ B ak p). 

We characterized [12] the output by a semantic structure T (m), which is a variant of 
the Kripke structure, T (m) is obtained by a sequence of each agent’s belief change. We 
defined a semantic relation T(m) |= (f> by a nonstandard modal logic, and implemented 
the algorithm by a recursive function belief(p, k, m ) that returns: 

- yes if T(m) (= B ao B ai ---B^p, 

- no if T{m) |= B ao B ai ■ ■ ■ B ak ^p, 

- unknown otherwise. 

The algorithm returns the output in polynomial time with respect to m and k if we 
avoid redundant computation by reusing computed function values. This efficiency is 
achieved by disregarding unobserved events and preconditions of events. It is trivial that 
the precondition of an event was true when the event occurred. The examination of the 
precondition is useful only when the fact that the precondition held at that time gives a 
very informative clue to a hidden fact. 

However, an agent does not always know the correct truth value of every proposition 
in the precondition. Therefore, the agent has to revise its belief about the past to make 
preconditions hold by introducing unobserved past events. This abductive inference is 
often intractable and fruitless. 

The algorithm computes nested beliefs by considering three factors: observation, 
effects, and memory. Figure 1 shows these factors. 

Example 1 R1 shows a situation in the real world. Cl shows your belief. You (Mr. 
C) and Mr. B are in the same room (Room 1), and you can see Mr. B. Therefore, you 
believe Mr. B is in Room 1. This belief is based on observation. 




Memory &Effect 



Fig. 1 . Three factors for nested belief 
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Then Mr. B moves to Room 2. R2 shows the situation after B’s action. C2 shows 
your new belief. You cannot see Mr. B any longer, but you believe B is in Room 2 by 
the effect of B’s action. You also believe that B believes you are in Room 1 by memory. 
Here, one’s memory at time t means one’s belief at time t — 1. On the other hand, you 
do not know Ms. A is in Room 2. Hence, you do not expect Mr. B to see her there. I 
Agent a’s observability of a persistent property (propositional fluent) p (e.g., B is in 
Room Misrepresented by a propositional formula ob [a, p\. IfpAob[a,p] holds, a comes 
to believe p by observing it. In order to guess beliefs from effects, we introduced post(e) 
and ob[a, e] for each instantaneous event e. post(e) denotes the set of all propositional 
variables made true by e. Agent a observes the occurrence of e if ob[a, e] holds in the 
state immediately before e ’ s occurrence. 

In order to keep one’s belief consistent, p’s incompatible proposition set N ({p}) was 
introduced for each primitive proposition p. Each element of the set is a propositional 
variable that never holds whenever p holds. Suppose you believe John is in the kitchen, 
and then you find John in the study. You have to retract the old belief to keep your belief 
consistent. In the same way, the algorithm ensures that p and an element of A' ({p}) do 
not hold at the same time in nested beliefs. 

The function belief(p, k, m ) checks af s latest observation of {p} U N ({p}) and the 
expected effects of the latest event e m by examining ob [a* , p] , post(e m ) , and ob [cik , e m ] . 
The function examines these conditions by using belief(-, k — 1, m(or m — 1)). If these 
rules give no information, it checks memory belief (^», k, m — 1), the latest belief before 
the occurrence of e m . 

1.2 Impediments to applications and their solutions 

We often encounter the following practical problems when we try to use the algorithm. 

1. There is no way to treat knowledge-producing actions [28,24,4] and conditional 
effects [27], 

2. We have to introduce an artificial propositional variable in order to deny a 
proposition. In addition, strange answers are returned in certain situations. We 
remedy these problems by extending the notion of observation conditions. 

3. It sometimes takes a somewhat long time to confirm a trivial fact. We therefore 
propose a faster progressive algorithm. 

Here, we describe these problems in detail. 

The reason for the first problem is that we assumed that e always causes a fixed set of 
effects post(e). However, events may cause different effects depending on the situation. 
Even if you turn on a floor lamp, it will not shine if it is unplugged. Such effects are 
called conditional effects. One might think that we could exclude such cases by their 
preconditions, but it is often difficult to distinguish failure from success with unintended 
effects (or no effects). Here, we solve the problem by introducing efficacy conditions of 
an event, which are similar to the event’s preconditions but do not affect its occurrence. 
We will describe the conditions in the next section. 

If a software agent wants to know the list of all files in a directory, it will execute a 
command like Is or dir. Then the agent can sense the environment without changing it. 
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Such an action is called a knowledge-producing action [28]. Such an action shows only 
a snapshot of a propositional fluent p. This means that a property p becomes observable 
only at an instance immediately after the action’s occurrence. We can represent this 
property by introducing e into ob[a,p] (i.e., ob[a,p]^ f (e A • • •) V • • • ). 

A situation for which the algorithm returns strange answers is shown in Fig. 2. 

Example 2 R1 shows a situation where you (Mr. B) are in Room 1 . B 1 shows your 
belief: Ms. A is in Room 2 (by memory). R2 shows Room 2 after you entered it. B2 
shows your new belief that Ms. A is not there. I 

The algorithm correctly guesses your belief revision about her location if we 
introduce a new proposition ’loc(A)^Room2’ £ N({ ’ loc (A)=Room2 ’ }) and 

its observability condition ob[B, ’ loc(A)^Room2 ’] = f 1 loc (B)=Room2 ; . (Note that 
1 loc (A)=Room2 ’ has nothing to do with a function. It is just a name of a 
propositional variable.) Then Hq ’ loc (A) ^Room2 ’ holds because B can observe 
1 loc(A)^Room2’. Since 1 loc(A)^Room2’ and ’loc(A)=Room2’ are incompatible, 
Bq-i ’ loc (A)=Room2 ’ holds. However, it is unnatural to introduce a propositional 
variable like , loc(A)^Room2 ) to deny a proposition. In this paper, we assume that 
a believes -i p whenever ->p A ob[a, p] holds ( negative obser\>ation) in order to ensure 
such artificial propositions are not introduced. 



B moves to the next room. 




Fig. 2. Expectation and belief revision 



There is a more serious problem: some deeper nested beliefs do not reflect this belief 
revision correctly. It is natural for you to expect to see Ms. A in Room 2 before you enter 
the room. You also expect that Ms. A will see you enter Room 2. However, you find she 
is not there. Then you must believe that she did not see you enter the room and that she 
does not know you are in Room 2. 

When we applied the algorithm to this example, we used the following rule: 

def 

ob[A, ’ enter (B,Room2) ’] = ’loc(A)=Rooml ’ V ’ loc(A)=Room2’. 

This rule means that Ms. A can see Mr. B enter Room 2 if she is in Room 1 or Room 2 
immediately before the event. ( ’ enter (B , Room2) ’ is not a first order term but an event 
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name given for readability. ) Then, according to the algorithm, you come to believe that 
she believes that you are in Room 2, because of your expectation that Ms. A will see you 
enter Room 2. The algorithm does not have a mechanism to retract this expectation. 

The algorithm used the previous belief B1 for ob[A, ’ enter (B , Room2) ’], but 
it should have used the revised belief B2. We remedy the problem by extending 
e’s observability condition. The new observability condition is applied to the state 
immmediately after e ’s occurrence. In addition, the condition may contain a subformula 
-k(f> that means <j> is evaluated in the previous state. If we use ob[A, ’ enter (B ,Room2) ’] 

dcf 

= * ’ loc(A)=Rooml ’ V ’ loc(A)=Room2’, the above strangeness can be removed. 

Lastly, we found the algorithm executes unnecessary computation. Regardless of the 
reuse of computed values, it sometimes takes a somewhat long time to confirm a trivial 
fact that irrelevant events did not affect a nested belief. However, it is not always easy 
to distinguish irrelevant events from relevant ones, because events can affect a nested 
belief indirectly through observability conditions. 



2 Methodology 

One’s belief is often modeled by accessibility among a set of possible worlds (Kripke 
structure). Tree structures obtained by unraveling Kripke structures are suitable for 
representing dynamic behaviors of knowledge and belief [32,31,22], 

We proposed a minimum coherent tree T (t) (Fig. 3) to represent the set of all nested 
beliefs at time t in [12,13]. Its root has just one outgoing edge for each agent. Each 
descendant of the root also has just one outgoing edge for each agent unless there is an 
incoming edge for the agent. This means each agent regards just one possible world. 

Since these conditions severely restrict expressiveness, we employed an alternate 
nonstandard structure [7] in order to improve its expressiveness. Each vertex of the 
above minimum coherent tree corresponds to a possible world in an alternate nonstandard 
structure. The structure allows us to represent a world where neither proposition p nor 
its negation —rp holds. 

In this paper, we use a multiagent extension of partial modal logic [15] for simplicity. 
The alternate nonstandard structure was originally introduced for alleviating logical 
omniscience, but we used only a part of the complicated structure. We will discuss the 
relationship among these logics in another paper. 

In a minimum coherent tree based on partial modal logic, the truth assignment of 
each world is a function mapping each propositional variable to {True, False, Null}. If 
propositional variable p is evaluated Null in a world, then this means we do not have 
enough information to decide whether p is true or false in the world. 

The tree has a unique vertex for every finite sequence of agent names 
00 , 01 , 02 , • • • , Ofc without successive repetition (i.e., ah ^ ah- 1 ), For example, a 
sequence of agent names (a, c, b) identifies a unique vertex acb in T(t). v ■ a represents 
v’s child with an edge labeled a (a-child of v). Each vertex corresponds to a possible 
world of partial modal logic. Vertex v’s truth assignment in it is denoted n(v, t), and is 
defined inductively. It depends on its predecessor v in T(t — 1) and v’s parent in T{€). 

For the following discussion, we divide the set of all propositional variables ( P 
into two separate subsets, <T>p and <I>e, where '/'/> is the set of persistent properties 
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Fig. 3. A minimum coherent tree 



(propositional fluents) and T>e is the set of instantaneous events. We assume e*( € ( P e ) 
is true only at time t. Then we use the following definitions. 

7r (v,t) : T> —> {True, False .Null}. 

(' T{t),v ) |= p iff n(v, t)(p) = True. 

(' T(t),v ) |= -<p iff n(v, t)(p) = False. 

(T(t),v) |= * 0 iff (T(t-l),v) |= (f>. 

|= =* 0 iff (T(i - l),f) h ->0. 

(T{t),v) |= <j) A ^ iff |=0 and \ = 0. 

(T(t) ,v) |= =(0 A 0) iff (T(t),v)\= -.0 or (T (f ) , v) |= =0. 

(T(t),v) |= B a (j) iff ( T{t),v ■ a) |= 0. 

(T (f) , v) |= —iB a <f> iff (T \t),v ■ a) (= =0. 

(T(t),v) |= 0-^0 iff (T(i),iO |0 <f> or (T(t),v) |=0. 

(T(f) , v) |= =(0 -= 0) iff (T (f ) ,«)|=0 and (T (f ) , v) t= -’0- 

The new implication 0 = 0 is a variant of strong implication 0 0 defined in [7], 

We will discuss this simplification formally in another paper, and explain why we use it. 
Anyway, this implication plays an important role in formalizing domain knowledge. For 
example, negative obseiyation can be represented by ->p A ob [a,p\ -= B a ->p. Material 
implication 0 => 0 defined by -i(0 A =0) does not hold when 0 and 0 are Am// (no 
information), but 0—^0 holds. In the following discussion, we use 0 V 0 as shorthand 
for ->(-i0 A -i0). 



2.1 Formalization 

Here, we give the formal definition of ir(v, t). 

The efficacy condition of an event is denoted ec[e, (->)p], where e € 4>e and 
p £ <Pp. If the condition holds in the previous state, e makes (-i )p hold. For example, 
ec[turn_on, shine] = (unplugged v power_failure -= false) means that shine 
becomes true unless unplugged or power failure holds. In addition, for consistency, 
we assume ec[e,p] A (ec[e, q] V ec[e, -i p\) never holds for any q £ 7V({p}). However, 
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observation may override the expectation. Suppose robot A plugged in a floor lamp and 
left the room. Then robot B unplugged it to use a vacuum cleaner. Later, A returned to the 
room and tried to turn on the lamp without looking at the plug. According to the above 
efficacy condition, A expects that the lamp will shine. However, A finds that it does not 
shine. Thus A believes that the lamp does not shine. If B is observing A’s behavior, B 
can correctly guess that A’s expectation is overridden by observation. 

Now, we can define n (v,t) in two steps: 1) update by ec[e, •] and 2) update by 
ob[a, •]. Since we define ~(v. t) independent of a vertex outside of the path from the 
root of T(f) to r; = ao • • • afe, we use ir(h, t) instead of 7r(ao, . . . , a^, t) for simplicity. 
In addition, we use tt ( h, t ) |= <f> as an abbreviation of (T(t), a o • • • ah) (= <t>- In order to 
define n(h, t), we introduce the following sets. 

- H(h, t)( C Fp) indicates an event that happened at n(h, t). 

- T(h, £)( C <Pp) is the set of persistent propositions true at n(h, t). 

- F(h , t) ( C Fp) is the set of persistent propositions false at n(h, t). 

Then we define w(h, t) as follows. 

- 7 r(h,t)(p) = Trueifp G H(h,t) U T(h,t). 

- 7 r(h,t)(p) = False if pG F(h,t). 

- Otherwise, n(h, t)(p) = Null. 

Now, we define these sets more formally. First, we define H(h , t). If ah observes an 
event e according to n(h — 1, t), ah believes its occurrence at time t. 

- H(0,t) = {e t }. 

- H(h{ > 0 ),t) = {e( G H(h - l,t)) | 7 r(h - 1 ,t) |= ob[a^,e]}. 

Second, we define expected effects of an event e f . If ah observes an event e according 
to 7 r(h — 1, £), ah believes it makes p true when ah believes that ec[e, p] was true at that 
time. The sets E + (h , t) and E~(h , t) indicate such effects. 

- E+{h,0) d ={}, E~{h,0) d ={}. 

- E + (h, t{ > 0)) = f {p( G Fp) | e G H(h , t) and 7 j(h., t — 1) (= ec[e,p]}. 

- E~{h, t( > 0)) = f {p( G Fp) | e G F[(h,t) and n(h, t — 1) |= ec[e, ~^p]}. 

- XE(h,t) d =E~(h,t)UN(E + (h,t)) 

where N({p ± , . . . ,p,}) = f 7V({pi}) U • • • U A^({p 2 }). 

Third, we define the observation of a persistent proposition p. We represent a (l ’s own 
observation at time t by the following sets. They are inputs to the algorithm. 

- O+(0, t) is the set of all true propositional variables ao observed. 

- O"(0, t) is the set of all false propositional variables ao observed. 

We use the following definitions to guess other agents’ observation. The sets 0 + (h,t) 
and 0~(h , t) indicate all the propositions ah observed according to n(h — 1, t). 

- 0 + {h{ > 0),f) d ={p( G T(h- 1, £)) | n(h- 1 ,t) |= ob [a h ,p\}. 
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- O (h(> 0 ),t) = {p( G F(h- 1 ,£)) | 7 r(h- 1 ,£) |= ob [a h ,p]}. 

- X0(h,t) d =0~(h,t) U N(0+(h,t)). 

Since one’s belief is updated by effects and observation, T(h,t ) and F(h,t) are 
consistently updated by using the above sets. First, it is updated by expected effects of 
e*. Then, it is updated by observed propositions. 

- T(h,0) = O+(h,0), F(h, 0) d =XO(h, 0). 

- T(h, £( > 0 )) = f ((T(/i, £ - 1) - XE(h, £)) U E+(h, £) - XO(h, £)) U 0+(h, £). 

- F(h,t( > 0)) d ={{F(h,t - 1) - E+(h,t)) U XE(h,t) - 0+(h,t))UX0(h,t). 

We can rewrite this two-step update to one-step update: 

- T(h, £( > 0)) = (T(h, £ - 1) - F+(h, £)) U T+(h, £), 

- F(h , £( > 0)) = (F(h, £ — 1) — T + \h, £)) U F + (h , t), where 

- T+(h,t) d ={E+(h,t) -XO{h,t))UO+(h,t), 

- F+{h,t) d =(XE(h,t) - 0+(h,t))UX0(h,t). 

Thus, p’s truth value is updated only if p £ ( T + (h , t) — T(h , t — 1)) U ( F + (h , t) — 
F(h, £ — 1)). We introduce a function log(p, £) that gives the history of ao’ s own belief 
about p up to time £. The function records only time points where the belief changed. 

- log(p, t) d = cons ((True, £),log(p, t — 1)) if t > 0 andp e T + (0,£) — T(0,£ — 1) 

or t = 0 andp G T(0,£). 

- log(p, £) = I cons((Fu , /se, £), log(p, £ — 1)) if t > 0 andp G F + (0,£) — F( 0, £ — 1) 

or £ = 0 and p G F( 0, £). 

- Otherwise, log(p, £) = f nil for t < 0, log(p, £)'^ f log(p, £ — 1) for £ > 0. 

2.2 A Progressive Algorithm 

Now we describe an algorithm for checking whether p G T(k, m) holds. The regressive 
algorithm starts by checking whether p G T + (k, m) orp G F + (k, m) holds. If it holds, 
the algorithm terminates immediately. On the other hand, the progressive algorithm starts 
by checking whetherp G T(k, m— 1) orp G F(k,m — 1) holds. That is, the progressive 
one recalls the past history before it checks the latest observation. 

In the case of Example 1, C believes B believes C is still in Room 1. If C employs 
the regressive algorithm, C starts by checking if B can see C’s location now. Since it is 
false, C checks if B thinks the latest action affected C’s location. It is also false, and C 
checks if B saw C’s location before the latest action. Then C finds that B saw C in Room 
1 , and believes B believes C is in Room 1 . 

On the other hand, the progressive algorithm starts by checking B’s previous belief, 
and finds that B believes C is in Room 1. Then it checks if B retracts the belief, but 
neither the latest action nor the latest observation affects the belief. Hence, C believes 
B believes C is in Room 1. 

We expected that the regressive one would be faster than the progressive one, but 
found that if we use a compact representation of belief histories, the progressive one is 
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much faster than the regressive one. The progressive one computes a belief and exploits 
it to skip the same belief states until it reaches the next belief change. On the other hand, 
the regressive one does not have such a hint until it reaches the answer. 

The progressive algorithm is shown in Fig. 4. It uses functions defined in Fig. 5. 
Each function checks if a given proposition p is an element of a certain set. Some of 
these functions require the value of n (h — 1 ,t)(p) for h > 0. This value is given in their 
last argument base. As for h = 0, base is False. 

The main function complog (p, h , t) computes how B ao B ai ■ ■ ■ B ah p changed from 
the beginning by examining new information by using newTrue and newFalse (Fig. 5) 
at each time point, and finally returns a list that records how the nested belief changed 
until t. Each element of the list looks like ( x , t) where x = True or x = False. It means 
that p’s truth- value was changed to x at time t. The list is sorted by t,, and the largest t is 
its first element. For instance, when the list is [(False, 5), (True, 3)], p was Null before 
t = 3, was True at t = 3, 4, and was False at 5 < t < m. For this list, the algorithm 
uses ordinary list operations: head, tail, and cons. Note that the returned list contains 
only belief change points. We can expect the list to occupy less space than the record of 
all computed values of belief(p, k, •) in the regressive algorithm. Since the progressive 
algorithm adds data only when belief changes, recording and accessing the computed 
values for reuse take less time. The algorithm also omits unnecessary computation. For 
example, newTrue(p, h, t) is not checked when ir(h,t)(p ) = True holds, because it 
does not lead to belief change. Flence, the progressive algorithm is expected to be more 
efficient than the regressive one. 

We need a function that decomposes a given formula into its constituents. The 
function hold (0, h, t ) decomposes <j) and evaluates ir(h, t) \= fi. V_part((Vo> to)) returns 
V o- limit(f, [(V„, to), • • • , (Vj,tj)]) removes all elements that satisfy £j > t. bellog 
is a reuse version of complog and calls complog if necessary. update(p, h, (log , t) ) 
records its answer log that records all changes until t by using p and h as search keys. 
lookup(p, h) finds the recorded log. 

The following functions directly treat domain knowledge. 

- rival(p) returns _/V({p}). 

- obs (p,h,t) checks if -rr(h 1, t) j= ob[a,h,p\ holds by using hold. 

- obsall(/i,f) checks if e* £ H(h,t.) holds by using hold(ob[aj, e*], i — 1, t) for all 
1 <i<h. 

- efc(p, h, t) checks if ir(h, t — 1) |= ec[e*,p] holds. 

- efcneg(p, h, t) checks if tr(h, t — 1) |= ec[e*, ->p\ holds. 

The correctness of the algorithm is formally specified by the next theorem. 

Theorem 1. The progressive algorithm satisfies the following properties unless there is 
an h such that ah = a^-i in v = do • • • a^. 

- COmplog(p, k, m) = [] ifftr(k,m) p and tt(k,m) ~<p. 

- head(complog(p, k, m)) = (True,-) iffn(k,m) | = p. 

- head(complog(p, k, m)) = (False,-) ijfTt(k,m) |= -<p. 

It is not difficult to prove this theorem with respect to the above semantic structure 
by mathematical induction. As an induction hypothesis, we assume COmplog(g, h, t) 
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DOMAIN KNOWLEDGE: 

ob[a,p], ob[a, e], ec[e,p], ec[e, ->p], 7V({p}) 

INPUT: 

ai, • • • , afc,p for B ao B ai ■ ■ ■ B ak p 
e 4 : the event observed by oo at time t where 1 < t < m 
log (q, t): q’s truth value at time t in ao’s belief 
OUTPUT: complog(p, k, m) 

// type ’list’ = [(VO, tO) , . . . , (Vn,tn)] 

// type ’prop’ = primitive proposition 
// type ’tfn’ = True or False or Null, 
function complog(p : prop, k: int ,m: int) : list = 
if (k=0) then return log(q,m) 

else { list past := if (m=0) then [] else bellog(p,k,m-l) ; 
tfn base := compBase(p,k,m) ; 
tfn prev := if (m=0 or past = [] ) then Null 
else v_part (head(past) ) ; 
case prev of 
True : 

if newFalse(p,k,m,base) then 

return cons ( (False ,m) , past) ; 
else return past ; 

False : 

if newTrue(p,k,m,base) then 

return cons ( (True ,m) , past) ; 
else return past ; 

Null: 

if newTrue(p,k,m,base) then 

return cons ( (True ,m) , past) ; 
else if newFalse(p,k,m,base) then 
return cons ( (False ,m) , past) ; 
else return past ; 

> 

function bellog(p :prop,k: int ,m: int) : list = 

<log,upto> := lookup(p,k); // Find a record for reuse, 
if (uptoCm) { // (p,k,m) is not computed yet. 
list newlog := complog(p,k,m) ; 

update(p,k,<newlog,m>) ; // The new record covers t=m. 
return newlog; 

} else return limit (m, log) ; // We need the value at t=m. 

function compBase (p :prop ,k : int ,m : int) : tfn = 
if (k=0) then return Null; 
else { list blog := bellog(p ,k-l ,m) ; 

if (blog = [] ) then return Null 
else return v_part (head (blog) ) ; } 



Fig. 4. A progressive algorithm 
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function newTrue (p : prop,h: int ,t : int .base : tfn) = 
return (newObsT(p,h,t ,base) or 
newExpT(p,h,t) and not (newObsF(p,h, t ,base) ) ) ; 
function newFalse (p :prop,h: int ,t : int , base : tfn) = 
return (newObsF(p,h, t ,base) or 
newExpF(p,h,t) and not (newObsT(p,h, t ,base) ) ) ; 
function newObsT(p : prop, h: int ,t : int , base : tfn) = 
return ((base=True) and obs(p,h,t)); 
function newObsF(p : prop, h: int ,t : int , base : tfn) = 
return ( (base=False) and (obs(p,h,t) or 
exists q in (rival p) s.t. 

( (compBase (q,h,t)=True) and obs (q,h, t) ) ) ) ; 
function newExpT(p : prop, h: int ,t : int) = 
return (m>0 and obsall(h,t) and efc(p,h,t)); 



// peT+{h,t) 

// p G F + (h,t) 

// P eO+{h,t) 
// p G XO{h,t) 

// peE+{h,t) 



function newExpF(p : prop, h: int ,t : int) = // p £ XE[h,t) 

return (m>0 and obsall(h,t) and 

(efcneg(p,h,t) or exists q in (rival p) s.t. ef c (q,h,t) ) ) ; 



Fig. 5. Each function checks if p is an element of a certain set (p £ Set) 



satisfies the above properties for any (h, t) s.t. 0 < h < k and 0 < t < m. We will give 
the proof in the full paper. 



3 Execution Time 

We implemented both algorithms in Standard ML of New Jersey. We used an improved 
regressive algorithm that records and reuses function values. Figure 6 compares them in 
terms of execution time on a Sun Ultra 2. For example, gaca shows the execution time 
for computing B g B a B c ’ loc(a)=living’ where g( = a o) can observe everything. 
According to the figure, the progressive algorithm is much faster than the regressive one. 
Just as before, we assume that each agent can observe anything in the same room and 
nothing in other rooms. Their actions are restricted by how these rooms are connected, 
but our algorithm does not explicitly use the topology. Since we assumed that agents do 
not think of past unobserved events, they are not interested in the route another agent 
used to get to a room. 

In the initial state, a was in the kitchen, b was in the study, and c was in 
the living room. Then a and b moved: ’enter (a, living) ’, ’ enter (b, living) ’, 
’ enter (b, dining) ’, ’enter(a, kitchen) ’. After that, c repeated two actions, 
’enter(c,rl) ’ and ’enter(c,r2) ’, 1000 times; the repetition does not affect 
participants’ beliefs about a’s location or b's location at all. Finally, c executed 
’ enter (c , living) ’ . Thus, we have 2005 (m = 4+2 x 1000 + 1) actions. 

For beliefs based on the latest observation, the regressive algorithm can be faster 
than the progressive one. The figure shows that the regressive algorithm is faster than 
the progressive one for a few cases, but in general, the progressive one is much faster 
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Fig. 6. Comparison of execution time (CPU time in seconds) 



than the regressive one. When we use balanced trees instead of lists, the progressive one 
becomes even faster. The figure also shows execution times for a regressive algorithm 
that uses compact lists (regressive2), but it is not very fast. 



4 Concluding Remarks 



The motivation of this study is similar to that in [24], [6], and [34]. However, they do not 
explicitly discuss how one should update one’s belief about another agent’s belief. This 
study formalized and optimized our informal Prolog program [11]. The improvements 
we presented in this paper remove the serious impediments to applying the nested belief 
algorithm to practical problems. Since we can write various domain knowledge for the 
algorithm now, we wish to find more practical applications. 

However, the new algorithm is still restricted. If agents are intelligent enough, they 
will revise their beliefs about the past (e.g., the borrowed car example [ 10]). This means 
we will have to introduce belief revision about the past. We have another algorithm 
modified for such cases, but we have not fully analyzed it yet. After the analysis, we will 
report the result in another paper. Other clues such as abduction [18], plausibility [10], 
and utility of actions [5,35] are also important. 

We did not discuss message exchange among agents [33,8,25], but it is undoubtedly 
an important topic. When we think of belief or knowledge given by a message, we 
need criteria in order to keep consistency. Lomuscio et al. [23] classify knowledge 
dependences among agents in static settings. 

There are several logic -based agent programming languages [21,20,3,36] formalized 
for actions and sensing. It would be interesting to see if our algorithm can be embedded 
in these systems. 
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Abstract. Although software agents are becoming more widely used, methodo- 
logy for constructing agent programs is poorly understood. In this paper, we take 
a step towards specifying and proving correctness for a class of agent programs 
based on the PRS architecture, Georgeff and Lansky [9], one of the most widely 
used in industrial settings. We view PRS as a simplified operating system capa- 
ble of concurrently running a series of plans, each of which at any time is in a 
state of partial execution. The PRS system is construed as using a simplified inter- 
rupt mechanism to enable it, using information about goal priorities, to “recover” 
from various contingencies so that blocked plans can be resumed and eventually 
successfully completed. We develop a simple methodology for PRS program con- 
struction, then present a formalism combining dynamic logic and context-based 
reasoning that can be used to reason about such PRS plans. 



1 Introduction 

Although software agents are becoming more widely used, as evidenced by the appea- 
rance of recent collections of articles with an applied emphasis, e.g. Bradshaw [4], me- 
thodology for constructing agent programs is poorly understood, at least in comparison 
with standard procedural programs. This is despite a wealth of work on the foundations 
of agency and intention, on agent architectures, on logics for planning and action, and 
on reactive systems. The continuing deployment of agent systems in industrial settings 
only makes methodological questions more pressing. 

In this paper, we take a step towards specifying and proving correctness for a class 
of agent programs based on the PRS architecture, Georgeff and Lansky [9], PRS and 
its successor dMARS are two of the most widely used architectures for building agent 
systems, and have been used in air traffic management, business process management 
and air combat modelling, Georgeff and Rao [11], PRS is a type of rational agent 
architecture, by which is meant that it is based on taking seriously the notion of intention, 
e.g. as expounded by Bratman [5]. 

Agents differ from standard computer programs in being goal directed and in being 
situated in a dynamically changing environment. The main complication with reasoning 
about agent programs is that the programs must therefore be responsive to changes in 
their environments. This means, for rational agents, that even though an agent may have 
committed to some intentions that enable it to achieve a particular goal, these intentions 
may need revision in the light of changes in the environment for execution failures of 
the agent, etc.). One way of looking at this is that the program actually changes as it is 
being executed ! 



N.R. Jennings and Y. Lesperance (Eds.): Intelligent Agents VI, LNAI 1757, pp. 42-56, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 
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We take the view that PRS is a kind of simplified operating system capable of running 
concurrently a series of plans, each of which at any time is in a state of partial execution. 
The system is operating in an environment which is dynamically changing, and the job 
of the interpreter is to monitor these changes and respond to them in such a way that 
the plans can succeed in achieving their goals. It does this, we contend, by use of a 
simplified interrupt mechanism that enables the system, using information about goal 
priorities, to “recover” from various contingencies so that blocked plans can be resumed 
and eventually completed. The job of the programmer is to specify plans that can be 
invoked to deal with every contingency that can occur. If it is possible to recover from 
every contingency, the system can be guaranteed to achieve its preset goals. 

We develop a formalism that can be used to reason about PRS programs viewed in this 
way, without claiming that this is the only way that PRS can be viewed. Our formalism 
is based on dynamic logic, and thus models programs as state transition functions (but 
where in Computer Science, the states are internal machine states, our states are external 
world states). The organization of the paper is as follows. In section 2, we discuss the 
various antecedents of this work and elaborate on the motivation for developing this line 
of research. In section 3, we review the PRS architecture, and in section 4, present a 
simple method for constructing PRS agent programs. We give a formalism for reasoning 
about PRS program construction based on our methodology in section 5, and illustrate 
the use of the formalism with a simple correctness proof in section 6. 



2 Motivation 

There have been various strands of work bearing on the question of program correctness 
for agent programs. Perhaps the most well understood is the work on reactive systems. 
Reactive systems are much simpler than rational agent programs in that there is no 
necessary notion of commitment and intention, making the architecture much simpler. 
In effect, purely reactive systems are finite state machines which accept as input some 
perceptual information about the current world state then execute an action. In Artificial 
Intelligence, Rosenschein and Kaelbling [20] have shown how such “situated automata” 
may be proven to have knowledge about the world and choose actions so as to achieve 
goals. Rao and Georgeff [19] have shown how model theoretic techniques may be used 
for the verification of reactive agent programs, and Fisher [7] has shown how to construct 
such agent programs from specifications written using a restricted form of temporal logic. 
However, these techniques do not straightforwardly apply to more complex architectures. 

There is also a growing literature on formalizing the logical properties of intention, 
e.g. Cohen and Levesque [6], Rao and Georgeff [17], Konolige and Pollack [13]. Some 
authors have considered formalizing of the dynamics of intention, e.g. Asher and Ko- 
ons [ 1 ] , Georgeff and Rao [ 1 0] , Bell [2] and Wobcke [24] . This work is mostly theoretical 
in nature, meaning that the aim is a general theory of intention and/or intention revision. 
These techniques are thus not necessarily connected to any particular agent architectures 
or systems. 

The present work is primarily motivated by the aim of developing a formal system 
applicable to reasoning about the correctness of programs written for an actual agent 
architecture, here the PRS architecture. The formal system is thus heavily “grounded” in 
the architecture. In this way, the work is more closely related to that on reactive systems 
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than to general theories of intention and action. In fact, we shall make relatively little 
use of the notions of intention, desire, belief, etc., raising the question as to the extent 
to which PRS really is a BDI architecture. 

In its use of dynamic logic, our work is related to that on the logic of action, e.g. 
Moore [15], Belnap and Perloff [3], Segerberg [21], Singh [22], Wobcke [25], in which 
actions are also modelled as state transition functions. However, this work has been 
applied mainly in static environments. One novel aspect of the present work is to apply 
these techniques in dynamic environments. 



3 PRS Agent Programs 

PRS (Procedural Reasoning System ) was initially described in Georgeff and Lansky [9] . 
Basically, PRS agent programs (as they will be called here) are collections of plans, 
officially called Knowledge Areas (KAs). These plans are essentially the same as standard 
plans in the Artificial Intelligence literature, in that they have a precondition (a condition 
under which the plan can be executed), an effect (a condition which successful execution 
of the plan will achieve), and a body (a collection of subactions which when successfully 
executed will achieve the effect). The body of a plan is very similar to a standard computer 
program, except that there can be subgoals of the form achieve g, meaning that the system 
should achieve the goal g in whichever way is convenient: these are the analogues of 
procedure calls. In addition, PRS plans have a context (a condition that must be true 
when each action in the plan is initiated), a trigger (a condition that indicates when the 
interpreter should consider the plan for execution), a termination condition (a condition 
indicating when the plan should be dropped), and a priority (a number indicating how 
important the plan’s goal is to achieve). The trigger is important in dynamic settings: 
when there are a number of ways of achieving a particular goal, the trigger helps the 
interpreter to find the “best” way of achieving the goal, given the current execution 
context. Note that due to unforeseen changes in the world, the execution context of a 
goal cannot always be predicted at planning time. The priority of each plan enables the 
system to determine which plan to pursue given limited resources (usually a plan with 
the highest priority is chosen for execution). Thus the use of triggers embodies a kind 
of “forward-directed” reactive reasoning, whereas goal reduction embodies a kind of 
“backward-directed” goal-driven reasoning. It is this combination of techniques which 
gives PRS its power. 

The original definition of PRS allowed for meta-KAs, or plans that are used to 
determine which other plans to execute. But in practice, these meta-plans have not 
been widely used, and do not figure in actual implementations of PRS such as UM- 
PRS [14], Subsequently, Rao and Georgeff [18] presented the simplified interpreter 
shown in Figure 1, effectively dispensing with the notion of meta-plans. In the following 
abstract interpreter, the system state consists of a set of beliefs B, goals G and intentions 
I. Each intention is, in effect, a partially executed program which has been interrupted 
due to a higher priority goal being pursued. Each cycle of the interpreter runs as follows. 
The process begins with the collection of external events recorded in an event queue, 
each of which may trigger pre-existing plans. Simple deliberation determines which plan 
is chosen for execution (usually a plan with the highest priority). The first action in this 
plan is then executed. After obtaining new external events, the set of system plans is 
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updated. First, those plans which have completed execution are discarded from the set 
/, then the system drops those plans which are determined to be impossible as a result 
of changes to the world or lack of resources. 



PRS Interpreter: 
initialize-state(); 
do 

options := option-generator(event-queue, B, G, I); 
selected-options := deliberate(options, B, G, I); 
update-intentions(selected-options, I); 
execute(l); 

get-new-external-events(); 
drop-successful-attitudes(B, G, I); 
drop-impossible-attitudes(B, G, I) 
until quit 



Fig. 1 . PRS Interpreter 



We assume the above version of PRS in this paper. Moreover, we make the following 
further simplifying assumptions. 

- Deliberation always chooses the highest priority intention for execution. 

- The priority of any subgoal of a plan is at least that of the original plan. 

This means that we can effectively view PRS as a kind of operating system execu- 
ting a number of interruptible processes, each process corresponding to one high level 
intention. Each process consists of a list of partially executed plans suspended at a call 
to achieve some subgoal, ordered according to the priority of the subgoals (higher level 
subgoals having lower priority). An interrupt occurs when an external event means that 
the current plan can no longer be executed. The interrupt is handled successfully when 
it triggers a plan that enables the system to “recover” from the change in the world to 
a point where the original plan can be resumed. Provided the programmer has defined 
sufficient contingency plans, i.e. at least one for each possible context in which each 
possible contingency can occur, the system should be able to recover and successfully 
achieve any assigned goal. Note the recursion inherent in this definition in that there can 
be contingencies to contingencies, i.e. contingencies that arise while a contingency plan 
is being executed. 



4 PRS Program Construction 

Our treatment of correctness relies on a particular approach to the construction of PRS 
programs which we present in this section, then formalize in section 6. We do not claim 
that the formalization applies to any PRS program. Consider the basic task facing the 
programmer when designing a correct program: this is to ensure that, for any given goal. 
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there is a plan that can be used to reach that goal from any one of a predetermined set of 
initial states. Note, however, that correctness is not all there is to agent programming: 
optimality is also a key issue, one we do not address. Optimality refers to the performance 
of the agent over time, and optimal performance may require the agent to take advantage 
of opportunities that arise during execution, meaning that not all plans that are initiated 
execute to completion. However, from the point of view of program design, each plan 
is formulated as if it will be executed completely. 

Consider designing a plan to achieve a particular goal g. We give the following 
intuitive picture as to how this might be done, taking for now the simple case in which 
it is assumed that there are no calls to subgoals and no contingencies that arise during 
execution. Recall that each plan has an associated context, a condition that must be true 
throughout the plan’s execution. It seems natural to start by determining a collection of 
possible initial states S, then proceed by dividing this set into subsets of states S, such 
that for each subset 5), it is possible to define a single plan P, that can achieve g without 
leaving the states in Si (except possibly at the end of the plan, when g itself is true). 
The subsets S) need not be disjoint, but their union should equal S. The next task is to 
define formulae q that characterize the .S), meaning that each q is true of all the states 
in Si but not true of any state not in .S', (because Pi does not work in these states) — 
this is not necessarily straightforward! The correctness of the plan P t can be expressed 
as the dynamic logic formula c, => \Pi]g, and proven so using standard techniques. 
Furthermore, the formula C\ V • • • V c n (assuming there are a finite number of contexts 
1, • • • , n) characterizes the set of initial states S , and the assumption that S contains all 
the possible initial states is expressed by the formula n(ci V • • • V c„). From this and 
the correctness of the individual plans, it follows that (\Pf\g V • • • V [P n ]g). 

Now consider designing plans to respond to “procedure calls”, i.e, to satisfy subgoals 
of the form achieve g occurring in a plan P. The plan will have a context c that characte- 
rizes a set of states S: each state in S is one in which P achieves its goal, assuming that 
all the calls to achieve g succeed. Note first that, not only should the procedure achieve 
g, but it should also maintain the context c. Thus the context of the subprogram should 
entail c. Now we may proceed as above, decomposing S into subsets Si, characterizing 
those subsets by context formulae c, , and defining plans Pi with contexts c, that achieve 
g. The priorities of each subplan Pi should be at least that of P. It is apparent that by 
repeating this process for calls to achieve subgoals within subprograms, the programmer 
defines a hierarchy of contexts by continually partitioning the original set of states S. 
For each subprogram Pi with set of states .S', and context c, of a plan P with set of 
states S and context c, we have that .S', C S and c, h c. That is, the hierarchy forms a 
partial order on sets of states with the ordering inherited from set inclusion. Note that, 
as above, from a collection of formulae q => [Pi]g for i = 1 , • • • , n, it follows that 
c => ([Pi\g V • • • V [P n \g), where c = c x V ■ ■ ■ V c„. 

The next stage is to consider the contingencies that can arise while executing a plan 
P. The purpose of the contingency plans is, whenever possible, to restore the context of 
the original plan c (or if this is impossible, to cause the original plan to be dropped by 
achieving -i c). However, it is not necessary that each contingency plan achieve c: below, 
we give a simple example where executing a sequence of contingency plans restores c. 
Moreover, a contingency plan need not directly achieve c; rather it can block the original 
plan from being executed until c is true. This is also illustrated in the example below. 
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Even so, it seems natural to start with a set of states S that defines when the contingency 
occurs, and to divide this set into subsets for each of which a contingency plan can be 
defined. The priority of any contingency plan must be greater than that of the original 
plan to ensure that the contingency plan is chosen by the interpreter for execution in 
preference to the original plan. Any subgoals in the contingency plan can be handled as 
described above. Now by repeating this process, i.e. by defining contingencies to handle 
contingencies, the programmer also defines a hierarchy of contexts, but in contrast to that 
defined for subprograms, this is a hierarchy of exceptions. That is, if P, is a contingency 
plan with set of states 5) and context c , , for a plan P with set of states S and context 
c, then there is no necessary relationship between S. t and S, nor between c, and c. The 
whole design process stops when there are no remaining contingencies to consider. 

For example, consider designing a simplified program for an aircraft to take off. 
Assume the basic takeoff plan can be defined, and succeeds provided the runway is free. 
That is, the condition -i runway free is a contingency. Two plans are defined to deal with 
this contingency, differing in their context of application. In one, the plane is on the 
runway and must be diverted; in the other, the plane is not on the runway and simply 
waits. Note the subtleties in even this program: the divert plan does not restore the 
original context runway free, but changes the context to -> on_runway so that the wait 
plan is invoked. Also, the wait plan may be repeatedly invoked until its trigger is false; 
when this is the case, the context of the takeoff plan is true, so this plan can be resumed. 
It only remains to assign priorities to the plans such that the contingency plans have 
higher priority than the takeoff plan (this can be done in any way convenient, so just let 
the priority of takeoff be 10 and the priority of divert and wait be 20). 

The final program is shown in Figure 2. Boxes indicate contexts and an arrow from 
one context to another indicates that the first handles a contingency that can arise while 
the second is executing. 




Fig. 2. Takeoff Program 
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5 Formal Semantics 

We now present a formalism that can be used to reason about PRS programs constructed 
as in section 4. The essence of the formalism is the combination of dynamic logic and 
context-based reasoning, Wobcke [23], The technical formalism is related to labelled 
deductive systems, Gabbay [8], in which each formula is assigned a label, signifying 
a context in which the formula is true. For a formula representing the correctness of a 
plan, the label can be identified with the possible execution contexts of the plan, and 
hence the label also indirectly represents a set of assumptions under which the plan can 
be proven correct. 

More precisely, we define a formal language LPDL ( Labelled Propositional Dynamic 
Logic), whose atomic formulae are of the form l : A where l is drawn from a set of labels 
and A is a formula of propositional dynamic logic. In this section, we first review dynamic 
logic, then present a formal semantics for LPDL which is then specialized to the case of 
PRS programs. In section 6, we give rules for reasoning about PRS agent plans which 
are illustrated using the aircraft takeoff plan. 

In our use of dynamic logic, our research follows that of Singh [22], who uses 
dynamic logic to model agent programs in an attempt to account for both the choice 
available to the agent in action selection and the nondeterminism in the execution of 
the actions. However, this work does not apply in dynamic environments, and it is most 
particularly the changes in the environment, and the agent’s responses to them, that we 
most wish to model using our formalism. 

5.1 Propositional Dynamic Logic 

In propositional dynamic logic, Pratt [16], program execution is modelled using state 
transition functions. But whereas in Computer Science the states are internal machine 
states, in AI the states are external world states, and where in Computer Science the 
programs are guaranteed to succeed (if they terminate), in AI actions are not guaranteed 
to be successful. Moreover, in Computer Science the internal machine states change 
only as the result of the program’s execution, whereas in AI there may be unforeseen 
changes to the world that are not the result of any action of the agent. 

The formal language of propositional dynamic logic consists of a set of formulae 
defined as follows, see Goldblatt [12]. First, the language consists of a set of atomic pro- 
position symbols P and atomic action symbols A. Program terms are built from atomic 
action symbols and the connectives ; (sequencing), U (nondeterministic alternation), * 
(iteration) and ? (test). Formulae are built from the set of proposition symbols in com- 
bination with the logical connectives -i. A, V and =4-, and modal operators [a] (one such 
operator for each program term a). Here [a] A is intended to indicate that A is true in 
all possible states that result from the successful execution of a. Note that the standard 
programming constructs can be defined in terms of these primitives as follows, 
if A then a else [3 = (A?; a) U (-A?; /3) 
while A do a = (A?; a)*; -A? 

The semantics of dynamic logic is based on binary state transition relations. More 
precisely, an interpretation consists of a set of program states S together with a set of 
binary relations R a on S, one for each program term a. 
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Definition 1. A PDL interpretation M is a tuple (S, R, V) where S is a nonempty set 
of states, R is a family of a binary relations R a on S, one for each program term a, and 
V is a mapping from the set of atomic proposition symbols P to 2 s . 

The following constraints on the R a ensure that each respects the operational se- 
mantics of the program construction operators. 

R a -p = R a ° Rp = {(s,t) : 3 u(R a (s,u) and Rp(u,t))} 

RaUP Ra U Rp 

R a * = R* a (the transitive closure of R a ) 

Ra? = {(s, s ) : M |= s A} 

Definition 2. A PDL interpretation M = (S, R, V) satisfies a formula A at a state 
s € S as follows. 

M A iff s £ VIA) for A an atomic proposition symbol 

M \= a A A B iff M |= s A and M |= s B 
M \= a A V B iff M |= s A or M \= s B 
M \= s A=> B iff M A or M f= 8 B 
M |= a [i a]A iff R a (s , t) implies M \= t A 

It can be shown, Goldblatt [12], that the following axiom schemata and rule are 
sound and complete with respect to the above semantics (that includes the constraints 
on the R a ). 

[a; f3\A <f=> [a] [/3]A 

[a U (3\A <=> ([a] A A \ff\A) 

[ a*]A =4- (T A [a] [a*] A) 

[a*] (A => [a] A) => (A => [ct*] A) 

[A?]B 4A (A=> B) 

[a] (A => B) => ([a] A => [a]B) 

If h A infer [a] A 

In addition, we extend the language of PDL to include a necessity operator □. We 
define M |= s DA iff M |= t A for all t £ S. It is evident that the necessity operator 
obeys the S5 axioms, so the following axiom schemes need to be added to PDL. 

DA =£■ [a] A 

□ (A=»B) => (DA => OB) 

□a 

□ A => -0-.A 

If h A infer □ A 



5.2 Labelled Propositional Dynamic Logic 

We now define a formal language LPDL ( Labelled Propositional Dynamic Logic) for 
use in context-based reasoning in conjunction with dynamic logic. The atomic formulae 
of LPDL are of the form l : A, where l is drawn from a set of labels and A is a for- 
mula of propositional dynamic logic. These atomic formulae can be combined using the 
propositional connectives -i, A, V and =>. 
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Definition 3. Let E be a nonempty set of states. A context in E is a tuple (a, t) where a 
is a set of PDL interpretations all of whose states are contained in E and i is a function 
assigning to each PDL interpretation ( S , R, V) contained in a a distinguished element 
of S (intuitively the initial state of S). 

Definition 4. A context (a, i) satisfies a PDL formula A if for each PDL interpretation 
M = ( S , R, V) in a, M K(m) a - 



Definition 5. An exception is a formula of propositional logic. 

Definition 6. Let C be a set of contexts. A context switching function on C is a function 
from C to 2 C . 

Definition 7. Let C be a set of labels. An LPDL interpretation A4 on £ is a tuple 
(E,C, E,I,1Z) where E is a nonempty set of states, C is a set of contexts in E, E is 
a set of exceptions, L assigns to each label l £ £ a context in C, and 1Z is a family of 
context switching functions IZc on C, one for each exception (£5. 

Definition 8. An LPDL interpretation A4 = (E . C. E . X, TV) satisfies an LPDL formula 
as follows. 

Ai |= l : A iff 1(1) |= A for A a PDL formula 
A4 |= -i A iff M. \/L A 
M |= A A B iff M. | = A and M (= B 
M \= A V B iff M \= A or M |= B 
M \= A^ Bifi M ^=^4or.M | = B 

Given that we have so far placed no restrictions on the context switching functions 
in LPDL interpretations, it is straightforward to axiomatize LPDL. LPDL contains all 
instances of the propositional calculus axioms and modus ponens obtained by replacing 
an atomic proposition symbol by an LPDL formula, together with the following axiom 
schemes, in which l stands for any label. 

I : A for A an axiom p/PDL 
l :(A=> B) => (l: A=>1: B) 
l : A => —i(l : —i A) 

I : A => I : OA 

Theorem 1. LPDL is sound and weakly complete with respect to the class of LPDL 
interpretations. 

This result is only a first step towards validating the semantics of LPDL against the 
operational semantics of the PRS interpreter. The following definitions lead up to the 
notion of a standard PRS model of a PRS program. 

Definition 9. A PRS program II is a set of plans each with a priority and a context 
formula. 
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Definition 10. Let II be a PRS program and let II p be the subset of 77 consisting of all 
plans of priority p. The set o/labels for 77 is defined as follows. First, each plan n £ 77 
has a distinct associated label l n . Then for any nonemepty subset n' p of 77 p equal to 
(„}, there is a distinct label li U • • • U l n . Finally, for any integer p for which a 
plan of priority p exists, there is a distinct label l p . 

Definition 11. A PRS interpretation of a PRS program 77 is a tuple (A4. f) where 
A4 = (S,C, F,T,1 Z) is an LPDL interpretation on the set of labels for 77 and f is a 
total pre-order on C such that the interpretation function T and the context switching 
functions 1Z £ respect the following conditions for all labels for 77 defined as above. 

i{h U"-ui„) =i{h) u-ui(g 

= U-TTG-ZIp } 

U • • • U In)) = 7^ ^ (Z(^ 1 )) U • • • U TZ^T(l n )) 

KtP(lp)) = \J*zn p {KeW*))} 

The union of two contexts consists of the set o/PDL interpretations in both contexts 
together with the initial state assignment function extended in the natural way. 

The following constraints on PRS interpretations, which define the class of stan- 
dard PRS models, are crucial for validating the behaviour of the interpreter in handling 
exceptions. 

Definition 12. Let (AT A), where M. = (£, C. F. X. 77), be a PRS interpretation of 
a PRS program 77. Each context c € C interprets a set of plans in 77 with the same 
priority, priority(c). Now let c^" be the set of contexts {d £ C : c A d and d (= £}. 
Then a standard PRS model for LI is a PRS interpretation that satisfies the following 
conditions for all contexts c,d € C. 

(i) c A d iff prior ity(c) < priority(c') 

(ii) 7 l{(c) = c)t 

Theorem 2. Standard PRS models are faithful to the operational semantics of PRS. 

The idea behind standard PRS models is not to capture the actual execution sequences 
of a PRS program, but rather to specify a range of possible execution sequences, rather as 
the PDL semantics captures nondeterminism using state transition relations. This is why 
condition (ii) states, in effect, that any context satisfying £ which has a priority greater 
than that of c provides a context for handling the exception £ arising in c. However, it 
may be that some of these contexts can never be chosen to deal with the exception £ in 
context c, e.g. if there is always a higher priority plan that the interpreter could choose 
instead. Thus standard PRS models do not provide a complete characterization of the 
behaviour of PRS programs. 

We do not attempt to prove this result here: in fact, it is not even clear that PRS (with 
our additional assumptions) is well enough specified to enable this result to be proven. 
So, alternatively, we could take the above to specify (though not completely) the correct 
behaviour of the PRS interpreter. Also, we have not attempted to axiomatize standard 
PRS models. Instead, we provide, in the next section, a number of sound inference rules 
which can be used for reasoning about the correctness of PRS agent programs. 
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6 Reasoning About Correctness 



The reasoning behind the process of designing PRS programs described in section 4 is 
essentially one of combining context-based reasoning for reasoning between contexts 
with dynamic logic for reasoning within contexts. For a given PRS program, there are 
numerous contexts that need to be considered. First, each plan is associated with a 
labelled context that corresponds to its execution context, as illustrated for the takeoff 
plan in Figure 2. Second, each contingency is itself associated with a labelled context 
corresponding to the execution contexts of the set of plans that may be invoked to deal 
with the contingency. Finally, each priority level is associated with a labelled context 
corresponding to the execution contexts of the set of plans of that priority — these are 
plans that can possibly compete with each other for selection by the interpreter for 
execution. 

A proof of correctness of a PRS program proceeds in stages, mirroring the design 
process. First, standard techniques are used to show correctness of plans and subprograms 
that execute in a single context. These proofs are all on the assumption that execution 
never leaves the assigned context, except possibly at the end of the plan when the goal 
is achieved. Next, reasoning between contexts is used to infer that all contingencies 
that arise during the execution of any plan can be successfully met. Any such proof 
of correctness is therefore reliant on the programmer’s having identified the range of 
possible contingencies to any plan. Finally, conclusions about lower level plans are 
“lifted” to higher level contexts, and the process repeated until the top level plans are 
reached. We present three rules corresponding to these types of inference. The soundness 
of these inference rules follows from properties of the PRS interpreter as reflected in 
the class of standard PRS models. We take it that they are intuitively correct, although 
future work would be to formalize the interpreter shown in Figure 1 to a degree where 
this could be verified formally. 

A proof begins with assumptions about the lowest level plans in the hierarchy, and 
proceeds inductively according to the structure of the context hierarchy, as indicated in 
the plan in Figure 2. The required assumptions all mean that there are no exceptions 
arising at the lowest level (highest priority) plans, and all have the following form. 

I : □(( context V goal) A -> termination ) 

We need contexts goal rather than just context because of the technical complication that 
the final state in the plan’s execution may not satisfy the context formula (it satisfies the 
goal formula). We envisage, therefore, that the proof of correctness for the plan involves 
verifying that the goal formula is false after execution of each subaction in the plan, 
except possibly at the final state. 

The Contingency Rule is used to infer that all contingency plans achieve some goal 
g. Here Achieves(g) (used informally to mean something like eventually ) is a special 
formula intended to indicate that the agent has a plan or plans that achieve g, and knows 
in which context to execute which plan. Here h, - • • , l n are any finite number of labels. 



li \ t =$* [cri]<7, , In ■ f ^ 

l\ U • • • U l n : t => Achieves(g) 



( Contingency Rule) 
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This rule is intended to be used when A , ■ ■ ■ , l n denote all the contexts whose plans deal 
with a contingency t, by achieving g : typically these plans all have the same priority. 

The Priority Rule is used to infer that out of all plans that have a given priority (the 
same priority as a contingency plan), the agent can achieve some goal. This rule is needed 
to ensure that the interpreter is still able to choose the correct plan(s) for execution when 
it must choose from a larger set of plans. The rule is as follows, assuming that Z 1? • • • ,l n 
are all the plans that have priority p, and that l p is the label associated with p. It is intended 
that the rule is applied when p is the priority of the contingency plans for dealing with 
the trigger t, and g is the goal achieved by the contingency plans. 



h : t => Achieves(g), • ■ • , l n : t => Achieves(^) 
l p : t => Achieves(g) 



{Priority Rule) 



The Lifting Rule connects contingency plans to the higher level plans from which 
their contingency derives. The rule is used to infer that in the higher level context, the 
contingency can be handled correctly. The soundness of this rule relies on the assumption 
that a formula chararcterizing the contexts of the contingency plans corresponds to an 
exception in the standard PRS model. In the following rule, £ denotes this exception and 
g the goal achieved by the contingency plans; goal is the goal of the high level plan. 



l p : £ => Achieves^) 

lc :□((£=> 9) V goal) 



{Lifting Rule) 



Intuitively, while the exception £ is true and the goal g is not true, the agent’s execution 
is in the context of l p , hence all states in c satisfy either the main goal or the negation of 
£ A ->g, i.e. £ => g. 

To illustrate the use of these rules in reasoning about PRS programs, consider the 
aircraft takeoff plan from Figure 2. We first assign labels (arbitrarily) to the execution 
contexts of the plans; let l t correspond to takeoff, Id to divert and l w to wait. The 
reasoning starts at the leaves of the tree, where it is assumed there are no contingencies 
that arise during the context of executing these plans. In this example, the assumptions 
that are needed are as follows. 



Id '■ □ {on-runway V -1 on-runway ) (1) 

l w ■ □ {~>on.runway V —> on. runway) (2) 

Assumption (1) is trivially true: it implies that there is no logical need for an exception 
to the divert plan. Intuitively this is because whenever such an exception could arise, 
the goal would already be true. However this does not preclude the possibility of plan 
failure for other reasons, and this could mean that further contingency plans are required. 
Assumption (2) is nontrivial: it states that when executing the wait plan, it is assumed 
that the plane is not on the runway. This would be false if it were possible for some event 
to cause the plane to become on the runway whilst waiting: this also could be reason for 
another contingency plan (perhaps the divert plan could be reused, although this would 
have to be verified). 

By constructing the proof, we aim to verify the following formula, which represents 
the assumption under which the takeoff plan should be proven correct. 

l t : ( runway free V airborne) 



( 3 ) 
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Forming the basis of the proof, we assume that the following formulae representing 
the correctness of the individual plans relative to their execution contexts can be proven 
using dynamic logic using the above assumptions. Each formula says that whenever the 
plan is initiated in a state in which its trigger is true, the plan achieves its goal. 

l t : [ takeojf]airborne (4) 

Id : -'runway -free =>• [ divert^on -runway (5) 

l w : -'runway .free => [wait\~<on.runway (6) 

As an aside, the following formulae can be proven using standard dynamic logic 
from the embedded correctness formulae in equations (5) and (6). 

H runway free => [wait*]— <on -runway (7) 

^runway free => [divert: wait*]—'On runway (8) 

The formulae indicate that both [wait*] and [divert] wait*] are possible plans in the 
combined context, but note that it does not follow that these are the only plans that can 
be executed in this context. 

Now we need to start reasoning about contexts. Let Id U l w denote the context 
corresponding to the contingency -'runway -free. In the present example, the Contingency 
Rule enables the inference of the following formula from (5) and (6), which means that in 
every context associated with the contingency -i runway -free, the condition -i on-runway 
is achieved. 

l d Ulw ■ ~< runwayfree => Ach\eves(-ion-runway) (9) 

The Priority Rule is now used to infer the following formula, which means that the 
set of plans at priority 20 handle the contingency -< runway free correctly (recall that the 
contingency plans both have priority 20). 

ho • -'runwayfree => Achieves(-iOH_nmway) (10) 

Finally, the Lifting Rule is used to prove the following formula, meaning that while 
the plane is attempting to take off, it is not on the runway unless the runway is free. This 
represents a “safety” condition that it is desirable to verify in this example. 

It ■ □ ((-'runway free =>• ~<on runway) V airborne) (11) 

This means that the following formula holds at every nonfinal state in context c, and so 
is a candidate for the context of the takeoff plan. 

on-runway => runway-free (12) 

But (12) does not entail the plan’s current context runway-free. Flowever, it is apparent 
that (12) more correctly represents the plan’s context — in the sense that if it does not 
represent the context, there is no guarantee the plan will work (with the current plan, it is 
assumed, but not required, that the plane is always on the runway). It should therefore be 
possible to prove (4) using this weaker context assumption. Alternatively, the condition 
-i on-runway could be added as a termination condition for the takeoff plan, so that (12) 
together with the negation of this condition imply the current context. In either case, the 
proof of correctness for the modified plan is now complete. 
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7 Conclusion 

Our formalism for verifying PRS programs is based on dynamic logic and reasoning in 
context. The method presumes a simple hierarchical design process for PRS programs 
in which contingencies to plans are identified and contingency plans then defined to deal 
with them. Any proof of correctness is dependent on the programmer’s having identified 
all the possible contingencies to any plan, and on the correctness of the contingency 
plans themselves. We do not claim that this is the only way to construct PRS agent 
programs, nor that this is the only way of applying formal methods to proving correctness 
for agent programs. The main lesson, we feel, is that any methodology for verifying 
agent programs must refer explicitly to the architecture of the system; the comparative 
simplicity of reactive systems perhaps explains why verification techniques for these 
systems are further advanced than those for more complex architectures. 

What we have attempted is to develop a method for reasoning about PRS programs 
that is firmly grounded in the actual architecture. The principal advantage of our approach 
is that the use of contexts provides a way for the designer of a PRS program to manage the 
complexity involved in developing agent programs. More precisely, the method allows 
the designer to concentrate on a subset (hierarchically ordered) of the possible execution 
contexts, rather than having to consider all possible worlds (or equivalently, all possible 
sets of beliefs, goals and intentions). Thus the task of reasoning about correctness is 
significantly simplified, albeit at the cost of covering only a subclass of the possible PRS 
programs. As a practical consequence, this work presents the possibility of developing a 
tool for PRS program development based on a system of contexts each used for reasoning 
about a standard plan or program. 
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Abstract. Commitments play a central role in multi-agent coordination. Howe- 
ver, they are inherently uncertain and it is important to take these uncertainties 
into account during planning and scheduling. This paper addresses the problem of 
handling the uncertainty in commitments. We propose a new model of commit- 
ment that incorporates the uncertainty, the use of contingency analysis to reduce 
the uncertainty, and a negotiation framework for handling commitments with un- 
certainty. 



1 Introduction 

In a multi-agent system, each agent can only have a partial view of other agents’ be- 
havior. Therefore, in order to coordinate the agents’ activities, the agents need to have 
a mechanism to bridge their activities based on the partial knowledge. Commitments 
has emerged, among many research groups [3,5,6,12], as the bridge for multi-agent 
coordination and planning. 

By definition, a commitment specifies a pledge to do a certain course of action [13]. 
A number of commitment semantics have been proposed, for example, the “Deadline” 
commitment C(T, Q, tdi ) in [6], means a commitment to do (achieve quality Q or above 
for) a task T at a time t so that it finishes before a specified deadline, tdi ■ When such a 
pledge is offered, the receiving agent can then do its own reasoning and planning based 
on this commitment, and thus achieves coordination between the agents. 

However, there are a number of uncertainties associated with commitments. First, 
there is the question about whether or not the commitment can be fulfilled by the offering 
agent. Tasks may fail, for example, and thus cannot achieve the quality promised. Or, 
the results may be delayed and therefore cannot meet the deadline. Also, the task T itself 
may depend on some preceding actions, and there are uncertainties about those actions. 
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Since the receiving agent depends on the predictable outcome of the commitment, this 
uncertainty must be considered. This type of uncertainty originates from the uncertainty 
of the underlying tasks. In this paper we propose the modeling of such uncertainty in 
terms of a distribution of the possible outcomes of a commitment, based on the statistical 
behavior of the tasks. 

A second source of uncertainty comes from the agent decision/planning process. As 
we know, flexibility is needed in order for the agent to operate in a dynamic environment. 
Therefore, when an agent’s beliefs and desires change, the agent should be able to change 
or revoke its commitments [13]. Hence, changes in the commitment can occur because 
of tasks not directly related to the fulfillment of the commitment. To the receiving 
agent, this can cause problems because its actions may depend on the honoring of 
the commitment in the offering agent. This aspect of uncertainty originates from the 
existence of commitment itself, not from the underlying tasks. In other words, it is 
inherent to the making of the commitment itself and not from possible under-performance 
of tasks, which is already addressed as the first source of uncertainty. In this paper we 
take into account of this uncertainty by explicitly describing the possibility of future 
modification/revocation of the commitment. Contingency planning [15], a mechanism 
for handling uncertain failures, is used in this work in order to reduce uncertainty and 
plan for possible future events such as failure or de-commitment. Also, a number of 
approaches have been proposed to handle this particular type of uncertainty, such as 
using a leveled commitments contracting protocol [17] and using option pricing schemes 
for evaluating contracts [18]. 

There is still another form of uncertainty caused by the partial knowledge of the 
offering agent regarding the agent who needs this commitment. Namely, how important 
or useful the commitment is to the receiving agent, and when the commitment would be 
not very useful to the receiving agent? To tackle this problem we define the marginal gain 
or loss [16] value of commitment and use this value to decide how the agents perform 
their reasoning and planning. 

For coordination to be successful when there are these forms of uncertainties, there 
must be structures that allow agents to interact predictably, and also flexibility for dyna- 
mic environment and imprecise viewpoints, in addition to the local reasoning capability 
[8]. For this purpose, we propose a domain-independent, flexible negotiation framework 
for the agents to negotiate their commitments. Our work differs from the conventions 
and social conventions [ 1 2, 1 3] in that our negotiation framework is domain-independent, 
and allows the agent to integrate the negotiation process in problem solving and dyna- 
mically reason about the local and social impact of changes of commitment, whereas 
conventions and social conventions define a set of rules for the agents to reconsider their 
commitments and ramifications to other agents when commitments change. 

The rest of this paper is structured as follows. In Section 2 we discuss the modeling of 
commitments, focusing on the uncertainties we discussed above. Section 3 discusses the 
impact of uncertainty in commitments on planning and scheduling, in particular the use 
of contingency analysis. In Section 4 we discuss the negotiation framework for handling 
the commitments with uncertainty. Experimental results illustrating the strength of our 
approach, is provided in Section 5. We conclude with a brief summary in Section 6. 
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2 Uncertainty in Commitments 

For the purpose of illustration, our discussion uses the "I '/CMS framework [7] for mo- 
deling the agent task environment. This does not introduce loss of generality because 
T/EMS is domain-independent and capable of expressing complex task environments. In 
terms of reasoning and coordination using the T/EMS, our discussion will focus around 
the scheduling framework of Design-to-Criteria scheduling [22] and the Generalized 
Partial Global Planning (GPGP/GPGP2) family of coordination mechanisms [6,19], 
The basic building blocks inT/liMS are tasks , methods, and interrelationships. Figure 
1 shows (partial) specifications of a task, a method, and an enables interrelationship. 



(spec_task 
(label tB) 
(supertasks tA) 
(subtasks tCl tC2) 
(qaf q_min) 
(deadline 100) 



(spec_enables 
(label enl) 
(from ml) 

(to m2) 



(spec_method 
(label m2) 

(supertasks tC2) 

(outcomes 
(o 1 

(density 100°/,) ; only one outcome 
(quality_distribution 2 70°/, 5 30°/,) 
(duration_distribution 3 50°/, 4 50°/,) 
(cost_distribution 0 80°/, 1 20°/,) 

) 

) 



Fig. 1 . TTaems Objects 



In T/FMS, agents’ problem solving knowledge is described in a terms of tasks orga- 
nized in a way to reflect the decomposition of a task into lower-level tasks (via subtasks 
and supertasks), and the way how the performance of lower-level tasks translates into 
the performance of higher-level tasks (via the quality accumulation functions, or qaf 
in short). In Figure 1 the qaf q_min in task tB means that the quality of tB, q(tB), is 
min(q(tCl), q(tC2)). This means that both tCl and tC2 need to be accomplished (i.e., 
an AND relation). 

Methods are atomic tasks (i.e., no subtasks), and the outcome of the method execution 
is characterized via the (q,c,d) tuple, which indicates the quality achieved, the cost 
incurred, and duration of the execution. T/FMS allows uncertainty in method outcomes 
by specifying discrete probability distributions of quality, cost, duration. 

Obviously, since tasks are often interrelated, method executions cannot be always 
assumed to be independent to each other, i.e., the outcome of one task/method may 
affect the outcome distribution of another method. In T/EMS, such effects are captured 
via interrelationships such as enables, facilitates, etc. For example, T a enables Ti, means 
T a must have accomplished a positive quality before 1), can start, essentially a precedence 
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constraint. In terms of conditional probability, this means that the quality of 7 \ would 
always be zero given that the quality of T a is zero. Similarly, the facilitates relationship 
specifies the change of the outcome distribution of a method given that some other task 
has achieved a quality above a certain threshold. 



i 

i 

i 

i 



i 

i 

i 



enables 





q(20% 0)(30% 2)(50% 6) q(20% 0X80% 6) 

c(100%0) c(100% 0) 

d(100% 60) d(100% 60) 



Fig. 2. Uncertainty in Commitment 



The first step for incorporating uncertainty in commitments is to take into account 
the uncertainty of underlying tasks. In [6], a commitment specifies only the expected 
quality of the committed task. However, expected values often do not provide sufficient 
information for effective coordination, especially when there are possible task failures. 
For example, Figure 2 shows some tasks in the schedule of agent A. Suppose A offers 
a commitment about method A2 to the agent B, and assume .42 is to be enabled by 
another local method Al. In this case, A 2 itself has expected quality 4.8. But, there is a 
20% chance that method A 2 will fail (q=0), and thus cannot be useful to B. Furthermore, 
because A2 is enabled by Al, which also fails in 20% of the time, the result is that the 
commitment has only 64% chance of being useful to B. To address this problem, the 
commitment should specify a distribution of possible outcomes, i.e., “64% chance (7 = 
120) A (q = 6), 36% chance (7 = 120) A (q = 0)”. In general, if C(T) is a commitment 
about task T, the outcome distribution of C(T) (i.e., p(C(T)), or equivalently, the actual 
outcome of task T, p(T)) depends not only on the outcome distribution of T (p°(T ), 
which does not take into account the effect of interrelationships), but also depends on 
the outcome distributions of the set of predecessor tasks of T. A predecessor task of 
T is a task that either enables T, or has some other interrelationships with T that may 
change the outcome distribution of T. Obviously, the outcome of a predecessor (i.e., 
p(M)), task in turn depends on the outcomes of its own predecessors. In the simplest 
case, let us assume that the only source of uncertainty comes from method quality, and 
only enables interrelationships exist, then, the probability of the quality outcome equals 
x is, 

P(q(C(T))=x) = ( p(q(M»0))-p°(q(T)=x) (1) 

Mepred(T) 

Probability propagation of general cases which involve duration, cost, as well as 
other types of interrelationships can be similarly deducted. 

Next, because commitments are generally future-oriented, agents need to revise their 
speculations about the future and therefore also the decision making over the time. This 
introduces the uncertainty in decision making, in this case, the uncertainty about whether 
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the agent respects or honors the commitment — in addition to the probabilistic outcome 
of commitments. For instance, we notice that an agent may de-commit its commitments 
during its problem solving process, when keeping the commitments is in conflict to its 
performance goal. As before, initially at time 0, agent A chooses the plan Al, A2 and 
offers commitment about A 2 to B. However, at time 60, where Al completes, in the 
case that Al fails or has q = 2, A may perform re-planning and select some alternative 
plan that can produce a better quality outcome. Clearly, B should be able to know this 
information at time 60 rather than to notice the commitment not in place at time 120. 
More interestingly, however, if we can specify at time 0 that there is a possibility of 
de-commitment at a future time (60), then B can take into account that possibility and 
not heavily depend on this commitment. On the other hand, if at time 60 Al finishes 
with quality 6, then the quality outcome of the commitment has updated to a better 
distribution “80% q=6 and 20% q=0 at time 120”, because now q(Al)=6. It would also 
be helpful if this information can be updated to B. In other words, agent A can tell B, 
“right now I pledge to do A 2 before time 120. However, you may hear more information 
about the commitment at time 60.” The additional future information may be good 
(better distribution) or bad (de-commit). But the important thing is that the other agent, 
B, can make arrangements ahead of time to prepare for such information, hence better 
coordination. 

One way to represent this uncertainty is to calculate p' (C(T),t), the probability that 
C(T) will remain kept at time t. The exact calculation of p' depends on the knowledge 
of ( 1 ) when and what events will trigger re-scheduling, and (2) whether or not a future 
re-scheduling would lead to changes in commitments — in other words, prediction of 
future events, decisions, and actions. Obviously, for complex systems, those information 
could be computationally expensive (if not impossible) to get. To avoid this problem, 
we do not calculate p' directly, instead we focus on the first part — the events that 
may cause re-scheduling to change commitments, for example, Al’s possible failure 
(or low quality) at time 60. This occurs only 50% of the time, which mean in 50% of 
time re-schedule will not happen at time 60, therefore, 50% is a lower bound of p'(60). 
It is implied that there is no change in the commitment before time 60, because of no 
re-scheduling, i.e., p'[t < 60) = 1. To agent B, this implies that time 60 is a possible 
checkpoint for the commitment offered by A. 

The checkpoints are calculated by analyzing the schedule to see at what times a 
failure or low performance of a method could seriously affect the performance goal 
of the agent in the future. In the language of contingency analysis, the tasks in the 
critical region are critical to the agent performance (and/or commitment), and thus their 
potential low performance outcome events would become the checkpoint events. The 
event information may include the time the event may occur, the task to be watched, the 
condition for re-scheduling (i.e., quality equals 0), and a lower bound for //. 

The third source of uncertainty comes from the partial knowledge the other agent, 
namely, how important is this commitment to others? To answer this question we first 
need to know how important this commitment is to me. By knowing this we can avoid bad 
coordination situations such as offering (and pay the high cost of honoring) a commitment 
that is of little value to the receiving agent, or in the contrary, canceling a commitment 
that is very important to the agent needs it for only little gain in local performance. 
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To solve this problem we use the notion of marginal cost and define the marginal loss 
as the difference of agent performance without making the commitment and the one 
making the commitment. A zero marginal loss means the commitment is “free”, i.e., 
the offering agent would strive to do the same with or without making the commitment, 
such as the case of A offers commitment on A2. Like quality values, marginal loss 
values are also dependent on future outcomes, and can change over time. For example, 
the same commitment on A2 would incur a marginal loss if .4 1 finishes with quality 
2, because in that case the alternative plan (A3, A4) would have higher expected local 
quality. Similarly, we define marginal gain as the difference of agent performance when 
receiving the commitment and the one without receiving it. A marginal gain of zero 
indicates that the receiving agent is indifferent to the commitment. 

Marginal gain/loss can be expressed in terms of the utility values (or distributions 
of utility values), in this case, task qualities. However, we need to note that since agents 
may use different utility scales. Thus, we use the relative importance to indicate how 
quality values in the other agent translate to the quality values in this agent. For example, 
agent A may believe that utility in agent B has importance 2.0, i.e., the utility in agent B 
equals twice the amount in A. Thus, it implies that a marginal gain of 5 in B can offset 
marginal loss of 10 in agent A. Clearly, a rational agent would try to maximize the value 
of its local utility plus marginal gain in other agents and minus the marginal loss due to 
the commitment it offered. For simplicity we do not address the importance issue here 
any further, and assume the importance value of 1 .0 is always used, i.e., the quality scales 
are the same in all agents. In general, though, a simple importance rating is not enough 
to characterize an agent’s utility model or the group utility function, a more complex 
model, such as the one Wagner and Lesser [20] proposed, could be used. 

In order to evaluate the marginal gain/loss against a particular commitment, we 
simply compare the best-quality alternatives with and without the commitment, and use 
the difference as the marginal gain/loss. 

As a result of the above discussion. Figure 3 shows the richer TAiMS specification 
of an example commitment, which pledges to do task A2. 



3 The Impact on Planning and Scheduling 

Now that a commitment has uncertainty associated with it, agents can no longer regard a 
commitment as guaranteed, and assume the absence of failures. Therefore, planning and 
scheduling in an agent becomes harder. However, the benefit of using uncertainty comes 
from better understanding of the commitment in the agents and therefore more effective 
coordination. To achieve this, we also need to change the local scheduling/planning 
activities. Traditionally, when the uncertainty of commitment is overlooked and thus the 
commitment is assumed to be failure proof, re-scheduling is often performed reactively 
to handle the appearance of an unexpected failure that blocks the further execution. 
This type of reaction is forced upon rather than being planned ahead. In a time sensitive 
environment, it is often too late. Therefore, it is desirable that the agent has the capability 
of planning in anticipation of possible failures and know the options if failures do occur. 
This way, necessary arrangements can be made before the failure may occur, and also 
we save the effort of re-scheduling by adopting a planned-ahead action in case of failure. 
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(spec_uncertain_commitment 

(label coml) (from_agent agentA) (to_agent agentB) 

(task A2) 

(type deadline) 

(outcomes ; ; — uncertain outcomes 

(o 1 

(density 100"/,) 

(quality 6 64'/, 0 36'/,) (finish_time 120 100’/,))) 

(update ; ; — list of possible checkpoints 

(ul 

(lowerbound 50’/,) (update_time 60 100’/,))) 

(marginal_loss 0.0) ; ; — no marginal cost to agent A 



) 



Fig. 3. Commitment that incorporates uncertainties 



To handle possible failure outcomes in commitment, we use contingency analysis in 
conjunction with the Design-to-Criteria scheduling. Due to space limitation, we cannot 
describe the details of contingency analysis here; details are available in [21]. In our 
approach, a failure in the commitment can be treated the same way as a failure in a local 
task. First, we analyze the possible task failures (or low quality outcomes) or commitment 
failures and identify alternatives that may improve the overall quality outcomes when 
failure occurs. Through contingency analysis, the resulting schedule is no longer a linear 
sequence of actions, as it is with ordinary scheduling; rather it has a branching structure 
that specifies alternatives and the conditions for taking the alternatives. 

To illustrate this, Figure 4 shows an example of task structures in agents A and B. 
Note the relations “A2 enables /i2” and “.41 enables // l”. They involve tasks in different 
agents, therefore are called non-local effects (NLE). The existence of NLEs drives the 
need of coordination. 

Assuming both agents try to maximize their quality outcome, and they both have a 
deadline of 160. Based on highest estimated utility, initially A would select schedule 
(Al, A2) and B would select (B 1, B2). Then, after the agents detect the NLE between 
A2 and 1 12, A would proactively pledge to complete ,42 by time 1 20, with some estimated 
quality. 

In Figure 5, (a) shows the linear schedules of agent A and B, and (b) shows the 
schedules with contingency. Clearly, the linear schedule only specifies the preferred path 
in the contingency schedule, where as a contingency schedule specifies s set of paths 
based on possible future outcome. Using contingency analysis, the value of a schedule 
is now computed based on this branching structure, and therefore is more accurate. To 
utilize this branching structure we need to monitor the progress of the execution and 
dynamically discover and analyze possible future branches, and therefore it is closely 
related to the monitoring of an anytime search process in the solution space (set of 
possible execution paths), such as the work of [10]. 
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Agent A 



□ 


method 


O 


task 




subtask relation 


► 


enables relation 




q(20% 0)(80% 6) q(40% 2X60% 4) q(10% 0X30% 3X60% 6) q(10% 0)(90% 3) 

c(100% 0) c(100% 0) c(100% 0) c(100% 0) 

d(10O% 60) d(100% 20) d(100%70) d(100% 30) 



Fig. 4. Example Task Structure 



Contingency analysis can also be used to handle uncertainty originated from chan- 
ging/revoking the commitments. As mentioned before, we can identify the critical re- 
gions in the schedule that may have significant impact on the overall quality if a failure 
occur in the critical regions, thus leads to the discovery of checkpoints. On the other 
hand, once we have the checkpoint information regarding a commitment, we can make 
contingency schedules to specify a recovery option. Let T a indicate that task T has ou- 
tcome a, for example, T F for failure of T, T 2 for q=2. Then we can specify a recovery 
option for {B\,B2) such as (Bl 2 , A1 F , B3) to indicate that when B 1 finishes with 
q=2 and A 1 fails, the agent should run B 3. This is a generalization of the previous case, 
since conceptually we can regard the failure of commitment as a type of de-commitment 
which comes at the same time as the finish time of the commitment. 

The use of marginal gain/loss becomes very important in scheduling and coordi- 
nation. Although in our modeling of commitments, changes or de-commitments are 
allowed (unlike the traditional case, where commitments are assumed to be fixed, that 
is, in the absence of failures), these changes are social rather than local. The introduction 
of marginal gain/loss ensures that commitments are properly respected in a social con- 
text. If the overall utility of a multi-agent systems is the sum of the utilities in each agent, 
assuming the importance of activities in different agents is normalized, then only when 
the marginal gain is greater than the marginal loss, a commitment is socially worthwhile. 
Likewise, the commitment should be revoked only where the marginal loss is greater 
than the gain. The difference between marginal gain(s) and loss(es) becomes the utility 
of the commitment itself (which is different from the utility of the task being pledged). 
Therefore, the social utility of a schedule is the local utility of the schedule plus/minus 
the marginal gain/loss of the commitment received/offered. Note that marginal gain/loss 
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Fig. 5. Schedules with contingency 



also changes during the course of problem solving, therefore it needs to be re-evaluated 
when some tasks are finished. 

4 The Negotiation Framework 

In order to add flexibility to coordination, we also introduce a commitment negotiation 
framework that allows agents to interact with each other in order to achieve better 
coordination. This negotiation framework provide the following primitives for agent 
negotiation (here RA stands for the agent requesting/receiving the commitment, and OA 
for the agent offering the commitment): 

- request. RA ask an agent to make a commitment regarding a task. Additional in- 
formation includes the desired parameters of the commitment (task, quality, finish 
time, etc.) as well as the marginal gain information. 

- propose: OA offers a commitment to one agent. Additional information includes the 
commitment content (with uncertainty associated) and possible marginal loss. 

- accept : RA accept the term specified in OA’s commitment. 

- decline : RA chooses not to use OA’s offer. This can happen when RA does not find 
the offer attractive but does not generate a counter proposal. 

- counter: RA requests for a change in the parameters specified in the offered com- 
mitment, i.e., makes a counter-proposal. Changes may include better quality or qua- 
lity certainty (i.e., a better distribution), different finish time, earlier/later possible 
checkpoints/re-schedule time 

- change: OA makes changes to the commitment. The change may reflect the OA’s 
reaction/compromise to RA’s counter-proposal. Of course, the RA may again use 
the counter primitive to react to this modified commitment as necessary, until both 
sides reach consensus. 

- no-change: If the OA cannot make a change to the commitment according to the 
counter-proposal, it may use this primitive to signal that it cannot make a compro- 
mise. 
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- decommit. OA cancels its offer. This may be a result of agent re-planning. 

- update : both RA and OA can provide updated or more accurate information regar- 
ding a commitment, such as changes in marginal gain/loss, changes in the uncertainty 
profile of the commitment during the course of problem solving, etc. 

- fulfilled : the task committed was accomplished by OA. 

- failure: the commitment was failed (due to unfavorable task outcomes). 

These primitives are used not only during the establishment of commitment, but 
also during the problem solving process. Therefore, they allow agents to negotiate and 
communicate their commitments dynamically during the problem solving period. The 
negotiation process help agents to be better informed about each other’s desires, intenti- 
ons, and outcomes, therefore reduces the uncertainty in commitments and results in better 
coordination. For example, at time 0, if agent A offers B a commitment to complete .42 
before time 120, agent B can see that this commitment is useless and counter-propose 
agent A to commit on task A4 before time 130. If such a commitment is offered with 
100% certainty, the marginal gain is 2.6. Flowever, agent A can only offer 90% certainty 
on A4, and such a commitment would cause a marginal loss of 0.72, which is acceptable 
to both agents. Clearly, the negotiation process helps the discovery of alternative com- 
mitments that leads to better social solutions. This is done by using marginal gain/loss 
information in negotiation. Without those information, agents’ coordination decisions 
would be based on local information only. 

Under this framework, each agent can implement a policy using the primitives, which 
decides its communication protocol based on the negotiation strategy the agent will use 
to carry out the negotiation. The policy decides issues such as what parameters to choose 
when requesting/offering a commitment, how much effort (time and iterations) the agent 
is willing to spend on the negotiation, and how often the agent updates its commitments, 
etc. For example, an agent can choose to neglect counter-proposals if it cannot afford 
the planning cost or does not have the capability to reason about counter-proposals. The 
policies are often domain-dependent, and the reasoning of the policies is beyond the 
scope of this paper. A formal account of the reasoning models for negotiation to form 
a joint decision is provided in [9]. In a general sense, negotiation can be viewed as a 
distributed search problem, and the policies reflect how the agents relax their constraints 
and search for compromises, such as the work of [14]. In this work, we use a simple 
policy that counter-propose only when the offered commitment brings no overall gain 
(i.e., marginal gain is less than marginal loss). If a counter-proposal cannot be found, 
the agent simply declines the commitment. 



5 Experiments 

In order to validate our approach, we implemented a generic agent that can work with 
a textual TAMS input. We simulate two instances of such agent, A and B, to work 
on the task structures presented in Figure 4. We perform some comparisons to show 
how the handling of uncertainty improves coordination, and therefore improve overall 
performance. We assume that both agents have deadline 160, and both agents try to 
maximize quality outcomes. 
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First, we study the base case, where commitments do not carry uncertainty informa- 
tion, and no negotiation is used: in this case, one agent pro-actively offer a commitment 
to the other agent, using only expected quality and finish time. In Figure 6 we shows the 
distribution of the final quality outcomes for 200 runs. Three histograms for the quality 
of A, quality of B, and the sum of them are shown in this figure. 






Fig. 6. Base Case 






quality of A+B 



Fig. 7. Second Case: With Uncertainty 



From the trace, we observed that H’s commitment about A 2 to finish by time 120 
does not leave agent B with enough time to finish task B 2 by its deadline 160. However 
due to no negotiation, B cannot confirm that A2 cannot arrive earlier, and they cannot 
discover an alternative commitment for task A4, since both agents found their best local 
alternative: (HI, A2) for A and ( Bl ) for B. 

In the second case, we add uncertainty information to the commitments. The com- 
mitment is still pro-active (with no negotiation), but the agents can use contingency 
planning to reduce the uncertainty in commitments. Figure 7 show the results for 200 
runs. Here we can see some slight improvement of quality outcomes in both agents, 
but the similar pattern of histograms indicates that this has only minor impact on the 
scheduling. Essentially, agents now produces contingency plan so that they can switch 
to a better alternative when undesirable situation occurs. This can be viewed as an incre- 
mental improvement to the existent plan, thus it is not going to change the agent behavior 
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Fig. 8. Third Case: Negotiation 



pattern significantly. Due to no negotiation, the improvements are restricted to agent’s 
local activities. For example, we notice that when Al finishes with quality 2, A will 
choose to switch to plan (.A3, -41) instead of to continue to run .42 (therefore effectively 
de-commit its commitment) because now (A3, AA) has higher expected utility. 

As the last case, we incorporate negotiation and using the marginal gain/loss infor- 
mation in commitment coordination. The results, shown in Figure 8, have very different 
patterns in the histograms. This indicates that the major changes in the agent’s activities. 
We can see that now A has a relatively lower quality outcome than it does in the pre- 
vious cases, but B has significant performance improvements. The overall result is that 
the sum of their qualities improved significantly. This is because the agents are able to 
find a better commitment between them (namely the commitment on A4) now, through 
negotiation with marginal costs. This commitment is social in that it helps to achieve 
overall better utility, although not all agents can have local gains. The following table 
shows the average quality outcomes in each case. 



Average Q 


A 


B 


sum(A,B) 


Case 1 


3.3 


2.32 


5.62 


Case 2 


3.49 


2.67 


6.16 


Case 3 


2.53 


4.6 


7.13 



This also shows that the integration of all the mechanisms: negotiation, contingency 
planning, and marginal gains/losses is very important in effectively handling of un- 
certainties. These mechanisms handles different aspects of uncertainty, and they work 
together to achieve better coordination. 

6 Conclusion 

In conclusion, we identified three sources of uncertainty inherent in commitments and 
discussed the ways to incorporate them into the modeling of commitments, and the 
mechanisms to handle the uncertainties, such as contingency analysis and negotiation. 
The goal of this work is to improve coordination effectiveness, and ultimately, to improve 
the overall utility of the multi-agent problem solving. Our results indicate that these 
mechanisms significantly improves coordination. 
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Our work so far has been focused on cooperative agents rather than self-interested 
agents. In cooperative multi-agent systems, the agents’ goal is to increase the expected 
total group utility. It needs to be pointed out that all three types of uncertainty mentioned 
here exist the same way for commitments in self-interested agents. However, for totally 
selfish agents, normally they would not exchange their marginal gains or losses, thus 
negotiation must be based on some other utility exchange model. It is interesting to note 
that self-interested behavior is not always desirable in agent societies, so a balanced 
model of individual and social utility may be used [2,11]. Also, many researchers are 
now looking at deliberative agents [1,4], where agents’ social stance could be situation- 
specific. Again, in these cases an agent’s utility model would also consider the society 
in which they act. 

With the introduction of uncertainties in our model of commitments, our approach 
is computationally more expensive than previous approaches where uncertainties are 
not explicit, especially when the distributions propagate in the analysis, and when the 
number of contingent plan increases. One way to manage the complexity is to recognize 
that the analysis of possible future contingency plans can be an anytime process, and 
therefore we may trade off accuracy with the effort of analysis. Heuristics for effectively 
pruning the search space can also be applied. 

The ability to handle uncertainty in commitments is especially important in a time- 
sensitive environment where agents cannot afford to re-schedule when failures occurs. 
Hence, by taking into account of the possibility of failure, this work also improves the 
reliability in problem solving. As the next step, we will address the issues related to 
the more general problem of fault-tolerance in multi-agent systems. Interesting issues 
may include: how to handle the uncertainty related to new (and possibly important) 
tasks arriving to an agent (which in turn may affect scheduling), the cost of dynamic 
monitoring, and the adaptive management of redundancies for fault-tolerance, etc. 
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1. Evaluating Agent Architectures 

Stuart Russell [14] describes rational agents as “those that do the right thing”. The 
problem of designing a rational agent then becomes the problem of figuring out 
what the right thing is. There are two approaches to the latter problem, depending 
upon the kind of agent we want to build. On the one hand, anthropomorphic 
agents are those that can help human beings rather directly in their intellectual 
endeavors. These endeavors consist of decision making and data processing. An 
agent that can help humans in these enterprises must make decisions and draw 
conclusions that are rational by human standards of rationality. Anthropomorphic 
agents can be contrasted with goal-oriented agents — those that can carry out 
certain narrowly-defined tasks in the world. Here the objective is to get the job 
done, and it makes little difference how the agent achieves its design goal. 

If the design goal of a goal-oriented agent is sufficiently simple, it may be 
possible to construct a metric the measures how well an agent achieves it. Then 
the natural way of evaluating an agent architecture is in terms of the expected-value 
of that metric. An ideally rational goal-oriented agent would be one whose design 
maximizes that expected-value. The recent work on bounded-optimality ([3], [15], 
[20], etc.) derives from this approach to evaluating agent architectures. However, 
this approach will only be applicable in cases in which it is possible to construct a 
metric of success. If the design goal is sufficiently complex, that will be at least 
difficult, and perhaps impossible. 

This paper will focus on anthropomorphic agents. For such agents, it is the 
individual decisions and conclusions of the agent that we want to be rational. In 
principle, we could regard an anthropomorphic agent as a special case of a goal- 
oriented agent, where now the goal is to make rational decisions and draw rational 
conclusions, but it is doubtful that we can produce a metric that measures the 
degree to which such an agent is successful in achieving these goals. It is important 
to realize that even if we could construct such a metric it would not provide an 
analysis of rationality for such an agent, because the metric would have to measure 
the degree to which the agent’s individual cognitive acts tend to be rational. Thus 
it must presuppose prior standards of rationality governing individual cognitive 
acts. 

In AI it is often supposed that the standards of rationality that apply to 
individual cognitive acts are straightforward and unproblematic, viz., Bayesian 
probability theory provides the standards of rationality for beliefs, and classical 
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decision theory provides the standards of rationality for practical decisions, it may 
come as a surprise then that most philosophers reject Bayesian epistemology, and 1 
believe there are compelling reasons for rejecting classical decision theory. Space 
precludes a detailed discussion of this issue, but I will say just a bit about why I 
think these theories should be rejected. 

Bayesian epistemology asserts that the degree to which a rational agent is 
justified in believing something can be identified with a subjective probability. 
Belief updating is governed by conditionalization on new inputs. There is an 
immense literature on this. Some of the objections to it are summarized in Pollock 
and Cruz [11]. Perhaps the simplest objection to Bayesian epistemology is that it 
implies that an agent is always rational in believing any truth of logic, because any 
such truth has probability 1 . This conclusion conflicts with common sense. Consider 
a complex tautology like [P <-» (Q & ~P)] — > ~Q. If one of my logic students picks 
this out of the air and believes it for no reason, we do not regard that as rational. 
He should only believe it if he has good reason to believe it. In other words, 
rational belief requires reasons, and that conflicts with Bayesian epistemology. 

Classical decision theory has us choose acts one at a time on the basis of 
their expected values. What is wrong with this is that it is courses of action or 
plans that must be evaluated decision-theoretically, and individual acts become 
rational by being prescribed by rationally adopted plans. (See Pollock [8].) This 
suggests a minor modification to classical decision theory which applies it to plans 
rather than acts. On this view, a plan is adoptable just in case it has a higher 
expected value than any of its competitors. However, 1 will argue below that this 
does not work either. The evaluation of plans in terms of their expected values is 
more complicated that this. The difficulty arises from the fact that plans, unlike 
acts, are structured objects that can embed one another and aim at different (more 
or less comprehensive) sets of goals. 

The design of an anthropomorphic agent requires a general theory of rational 
cognition. The agent’s cognition must be rational by human standards. Cognition 
is a process, so this generates an essentially procedural concept of rationality. 
Many AI researchers have followed Herbert Simon [16] in rejecting such a procedural 
account, endorsing instead a satisficing account based on goal-satisfaction, but that 
is not applicable to anthropomorphic agents. 

There is a problem, however, concerning how to understand procedural ratio- 
nality. We do not necessarily want an anthropomorphic agent to model human 
cognition exactly. For example, there is a robust psychological literature [17] 
indicating that humans have to learn modus tollens. Modus ponens seems to be 
built into our cognitive architecture, but modus tollens is not. But surely there 
would be nothing wrong with building modus tollens into the inferential repertoire 
of an anthropomorphic agent. We want such an agent to draw rational conclusions 
and make rational decisions, but it need not do so in exactly the same way humans 
do it. How can we make sense of this? Stuart Russell [15] (following Herbert 
Simon) suggests that the appropriate concept of rationality should only apply to the 
ultimate results of cognition, and not the course of cognition. To make this more 
precise, let us say that a conclusion or decision is warranted (relative to a system 
of cognition) iff it is endorsed “in the limit”, i.e., there is some stage of cognition at 
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which it becomes endorsed and beyond which the endorsement is never retracted. 
Then we might require an agent architecture to have the same theory of warrant as 
human rational cognition. This is to evaluate its behavior in the limit. 

However, there is a problem for any assessment of agents in terms of the 
results of cognition in the limit. An agent that drew conclusions and made decisions 
at random for the first ten million years, and then started over again reasoning just 
like human beings would have the same theory of warrant, but it would not be a 
good agent design. 

It looks like the most we can require is that the agent’s reasoning never 
strays very far from the course of human reasoning. If humans will draw a 
conclusion within a certain number of steps, the agent will do so within a “comparable” 
number of steps, and if a human will retract the conclusion within a certain number 
of further steps, the agent will do so within a “comparable” number of further 
steps. However, it has to be admitted that this is vague. A natural proposal for 
making this more precise might be to require that the worst-case difference be 
polynomial in the number of steps, but this seems pretty artificial. Furthermore, 
this particular proposal would not rule out the agent that drew conclusions randomly 
for the first ten million years. 

I am not going to endorse a solution to this problem. I just want to call 
attention to it, and urge that whatever the solution is, it seems reasonable to think 
that the kind of architecture I am about to describe satisfies the requisite constraints. 



2. The OSCAR Architecture 

OSCAR is an architecture for rational agents based upon an evolving philosophical 
theory of rational cognition. The general architecture is described in Pollock [8], 
and related papers can be downloaded from http://www.u.arizona. edu/~pollock. 

OSCAR is based on a schematic view of rational cognition according to 
which agents have beliefs representing their environment and an evaluative mech- 
anism that evaluates the world as represented by their beliefs. They then engage in 
activity designed to make the world more to their liking. This is diagrammed in 
figure 1 . This schematic view of rational cognition makes it natural to distinguish 
between epistemic cognition, which is cognition about what to believe, and practical 
cognition, which is cognition about what to do. We can think of the latter as 
including goal selection, plan construction, plan selection, and plan execution. 

It is probably fair to say that most work on rational agents in AI has focussed 
on practical cognition rather than epistemic cognition, and for good reason. The 
whole point of an agent is to do something, to interact with the world, and such 
interaction is driven by practical cognition. From this perspective, epistemic cognition 
is subservient to practical cognition. 
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Figure 1. Doxastic-conative loop 

The OSCAR architecture differs from most agent architectures in that, although 
it is still practical cognition that directs the agent’s interaction with the world, most 
of the work in rational cognition is performed by epistemic cognition. Practical 
cognition evaluates the world (as represented by the agent’s beliefs), and then 
poses queries concerning how to make it better. These queries are passed to 
epistemic cognition, which tries to answer them. Plans are produced by reasoning 
about the world (epistemic cognition). Competing plans are then evaluated and 
selected on the basis of their expected utilities, and those expected utilities are 
again computed by epistemic cognition. Finally, plan execution generally requires 
a certain amount of monitoring to verify that things are going as planned, and that 
monitoring is again carried out by epistemic cognition. In general, choices are 
made by practical cognition, but the information on which the choices are based is 
the product of epistemic cognition, and the bulk of the work in rational cognition 
goes into providing that information. 

Epistemic cognition and practical cognition interact in important ways. The 
point of epistemic cognition is to provide the information required for practical 
cognition. This information is encoded in the form of beliefs, and the beliefs are 
used by all three modules of practical cognition, as diagrammed in figure 2. 

The purpose of epistemic cognition is to provide the information used by 
practical cognition. As such, the course of epistemic cognition must be driven by 
practical cognition. It tries to answer questions posed in the pursuit of the solution 
of practical problems (the ultimate epistemic interests). Such reasoning is interest- 
driven. A certain amount of reasoning is also driven directly by the input of new 
information. So the basic interface between epistemic and practical cognition is as 
diagrammed in figure 3. 
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Figure 2. The subservient role of epistemic cognition 




EPISTEMIC 

COGNITION 



PRACTICAL 

COGNITION 



Figure 3. The basic interface. 
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Figure 3 also indicates a distinction between two kinds of epistemic cognition 
— epistemic reasoning, and Q&I modules. Although it is hard to say precisely 
what defines reasoning, we can say at least that it is serial, introspectible, and slow. 
A great deal of human cognition is performed instead by special-purpose Q&I 
(quick and inflexible) modules. For example, one could try to catch a baseball by 
reasoning about its trajectory, but that would be too slow. Instead, we employ a 
cognitive module whose sole purpose is the calculation of trajectories. This module 
achieves speed in part by making assumptions about the environment, e.g., that the 
ball will not bounce off of obstacles. There is reason to believe that humans have 
many built-in Q&I modules, with reasoning serving primarily to monitor and correct 
their output and to enable us to address questions for which we have no Q&I 
modules. This would seem to be a good architecture for agents in general. The 
Q&I modules provide speed, and reasoning provides generality. 

It is often overlooked that it may not be possible to answer the questions 
posed by practical cognition simply by reasoning from previously held beliefs. 
Instead, the agent may have to “examine the world”. E.g., it may have to discover 
what time it is by looking at a clock, or count the number of apples in a barrel, or 
look something up in an encyclopedia, or in an extreme case, send a spacecraft to 
Mars or run an experiment on a linear accelerator. Actions performed to acquire 
information constitute empirical investigation. Empirical investigation requires 
interacting with the world, and such interaction is driven by practical cognition, so 
the result is a loop wherein empirical investigation gives rise to epistemic goals, 
which are passed to practical cognition. Practical cognition then adopts interest in 
finding plans for achieving the epistemic goals, and passes that interest back to 
epistemic cognition. Thus epistemic and practical cognition are interleaved. 

Plans for achieving epistemic goals often include actions controlling sensors. 
There is a distinction between active and passive perception, analogous to the 
distinction between looking and seeing. In the basic interface, perception is essentially 
passive, just feeding information to epistemic cognition. But perception can be 
controlled by, e.g., controlling the direction of the sensors, thus controlling what 
the agent is looking at. This constitutes “active perception”. Empirical investigation 
can be combined with the basic interface to produce the diagram in figure 4. 

Sophisticated agents can also engage in reflexive cognition — cognition 
about cognition. Applying practical cognition to reasoning can enable the agent to 
alter the course of its own reasoning, deciding what to think about and what 
strategies to use in problem solving. It is an open question how much power the 
agent should have over its own cognition. For example, it is reasonable for the 
agent to be able to alter the priority of cognitive tasks waiting to be performed, but 
presumably we do not want an agent to be able to make itself believe something 
just because it wants to. Adding the ability to redirect cognition, we get the 
general architecture diagrammed in figure 5. 
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Figure 4. Empirical investigation 




Figure 5. The Architecture for Rational Cognition 
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3. Epistemic Reasoning 

Now let us look more closely at the structure of epistemic reasoning in OSCAR. 
We have seen that epistemic reasoning is driven by both input from perception and 
queries passed from practical cognition. This is accomplished in the OSCAR 
architecture by bidirectional reasoning. The queries passed to epistemic cognition 
from practical reasoning are epistemic interests, and OSCAR reasons backwards 
from them to other epistemic interests. In addition, OSCAR reasons forwards from 
beliefs to beliefs, until beliefs are produced that discharge the interests. 

OSCAR is based upon a “natural deduction” theorem prover, rather than a 
more traditional resolution refutation theorem prover. This is because one of the 
principle objectives in building OSCAR was to produce a defeasible reasoner (see 
below). Resolution refutation makes essential use of reductio ad absurdum, and 
the latter is invalid for defeasible argumentation. Reasoning defeasibly to conflicting 
conclusions defeats the reasoning rather than disproving the premises. 

Perhaps the most novel aspect of OSCAR’s bidirectional reasoning is that 
reason-schemas are segregated into backward and forward schemas. Forward sche- 
mas lead from conclusions to conclusions. Backward schemas lead from interests 
to interests. For example, simplification (infer the conjuncts from a conjunction) is 
a fine rule to use in forward inference, but it would be combinatorially explosive 
when used in backward inference. Given an interest in P, it would have us adopt 
interest in every conjunction containing P as a conjunct. Similarly, adjunction 
(infer a conjunction from its conjuncts) is a natural rule to use for backward 
inference — given an interest in a conjunction, adopt interest in its conjuncts. But 
used in forwards inference it would have the reasoner form arbitrary conjunctions 
of its beliefs. A more sensible reasoner only forms conjunctions when they are of 
interest. 

The motivation for building OSCAR in this way was to provide a platform 
for defeasible reasoning, but OSCAR turns out to be surprising efficient as a 
deductive reasoner. In a recent comparison with the highly respected OTTER 
resolution-refutation theorem prover (http://www.mcs.anl.gov/home/mccume/ar/ 
otter) on a set of 163 problems chosen by Geoff Sutcliffe from the TPTP theorem 
proving library (Suttner and Sutcliffe, [18]), OTTER failed to get 16 while OSCAR 
failed to get 3. On problems solved by both theorem provers, OSCAR (written in 
LISP) was on the average 40 times faster than OTTER (written in C). 

The principal virtue of OSCAR’s epistemic reasoning is not that it is an 
efficient deductive reasoner, but that it is capable of performing defeasible reasoning. 
Deductive reasoning guarantees the truth of the conclusion given the truth of the 
premises. Defeasible reasoning makes it reasonable to accept the conclusion, but 
does not provide an irrevocable guarantee of its truth. Conclusions supported 
defeasibly might have to be withdrawn later in the face of new information. All 
sophisticated epistemic cognizers must reason defeasibly. This is illustrated by the 
following considerations: 

• Perception is not always accurate. In order for a cognizer to correct for inaccurate 
perception, it must draw conclusions defeasibly and be prepared to withdraw 
them later in the face of conflicting information. 
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• A sophisticated cognizer must be able to discover general facts about its envi- 
ronment by making particular observations. Obviously, such inductive reasoning 
must be defeasible because subsequent observations may show the generalization 
to be false. 

• Sophisticated cognizers must reason defeasibly about time, projecting conclusions 
drawn at one time forwards to future times (temporal projection). For example, 
consider a robot that can draw conclusions about its environment on the basis of 
perceptual input. It is given the task of comparing two meters, determining 
which has the higher reading. It examines the first meter and see what it reads, 
then turns to the second meter and reads it. But it is not yet in a position to 
compare them, because while examining the second meter it only knows what 
the first meter read a moment ago , not what it reads now. Humans solve this 
problem by assuming that the first meter has not changed in the short amount of 
time it took to read the second meter, but this is obviously a defeasible assumption 
because it could have changed. This illustrates that perception is really a form 
of sampling, telling a cognizer about small parts of the world at different times, 
and if the cognizer is to be able to put different perceptions together into a 
coherent picture of the world it must be allowed to assume defeasibly that the 
world does not change too fast. 

• It will be argued below that certain aspects of planning must be done defeasibly 
in an autonomous agent operating in a complex environment. 

Defeasible reasoning is performed using defeasible reason-schemas. What 
makes a reason-schema defeasible is that inferences in accordance with it can be 
defeated. OSCAR recognizes two kinds of defeaters. Rebutting defeaters attack 
the conclusion of the inference. Undercutting defeaters attack the connection 
between the premise and the conclusion. An undercutting defeater for an inference 
from P to Q is a reason for believing it false that P would not be true unless Q were 
true. This is symbolized (P 0 Q). More simply, (P 0 Q ) can be read “P does not 
guarantee Q'\ For example, something’s looking red gives us a defeasible reason 
for thinking it is red. A reason for thinking it isn’t red is a rebutting defeater. 
However, if we know that it is illuminated by red lights, where that can make 
something look red when it isn’t, that is also defeater but it is not a reason for 
thinking the object isn’t red. Thus it constitutes an undercutting defeater. 

Reasoning defeasibly has two parts, (1) constructing arguments for conclusions 
and (2) deciding what to believe given a set of interacting arguments some of 
which support defeaters for others. The latter is done by computing defeat statuses 
and degrees of justification given the set of arguments constructed. OSCAR uses 
the defeat-status computation described in Pollock [8]. 1 This defeat status computa- 
tion proceeds in terms of the agent’s inference-graph, which is a data structure 
recording the set of arguments thus far constructed. We then define: 

A partial-status-assignment for an inference-graph G is an assignment of “de- 
feated" and "undefeated" to a subset of the arguments in G such that for each 
argument A in G : 



1 For comparison with other approaches, see [12]. 
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1. if a defeating argument for an inference in A is assigned “undefeated”, A is 
assigned “defeated”; 

2. if all defeating arguments for inferences in A A are assigned "defeated”, A is 
assigned “undefeated”. 

A status-assignment for an inference-graph G is a maximal partial-status- 
assignment, i.e., a partial-status-assignment not properly contained in any other 
partial-status-assignment. 

An argument A is undefeated relative to an inference-graph G of which it is a 
member if and only if every status-assignment for G assigns “undefeated” to A. 

A belief is justified if and only if it is supported by an argument that is undefeated 
relative to the inference-graph that represents the agent’s current epistemological 
state. 

Justified beliefs are those undefeated given the current stage of argument 
construction. Warranted conclusions are those that are undefeated relative to the 
set of all possible arguments that can be constructed given the current inputs. 
Raymond Reiter [13] and David Israel [4] both observed in 1980 that when reasoning 
defeasibly in a rich logical theory like first-order logic, the set of warranted conclu- 
sions will not generally be recursively enumerable. This is because determining 
whether an argument is defeated requires detecting consistency, and by Church’s 
theorem the latter is not recursively enumerable. This has the consequence that it 
is impossible to build an automated defeasible reasoner that produces all and only 
warranted conclusions. In other words, a defeasible reasoner cannot look much 
like a deductive reasoner. The most we can require is that the reasoner systematically 
modify its belief set so that it comes to approximate the set of warranted conclusions 
more and more closely. More precisely, the rules for reasoning should be such 
that: 

(1) if a proposition/ 3 is warranted then the reasoner will eventually reach a 
stage where P is justified and stays justified; 

(2) if a proposition P is unwarranted then the reasoner will eventually reach a 
stage where P is unjustified and stays unjustified. 

It is shown in Pollock [8] that this is possible if the reason-schemas are “well 
behaved”. 

Given the ability to perform general-purpose defeasible reasoning, we can 
then provide an agent with reason-schemas for reasoning about specific subject 
matters. For example, OSCAR makes use of the following reason-schemas: 

PERCEPTION 

Having a percept at time t with content P is a defeasible reason to believe 
P-at-r. 
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PERCEPTUAL-RELIABILITY 

“R is true and having a percept with content P is not a reliable indicator of P’s 
being true when R is true” is an undercutting defeater for PERCEPTION. 

TEMPORAL-PROJECTION 

“P-at-f” is a defeasible reason for ”P-at-(/+A/)”, the strength of the reason being 
a monotonic decreasing function of At. 

STATISTICAL-SYLLOGISM 

“c is a B & prob(7l/P) is high" is a defeasible reason for “c is an A”. 

To illustrate the use of these reason-schemas, consider the following problem. 
First, Fred looks red to me. Later, I am informed by Merrill that I am then wearing 
blue-tinted glasses. Later still, Fred looks blue to me. All along, I know that the 
probability is not high of Fred being blue given that Fred looks blue to me but I am 
wearing blue-tinted glasses. What should I conclude about the color of Fred? 
Intuitively, Fred’s looking red gives me a reason for thinking that Fred is red. 
Being informed by Merrill that I am wearing blue-tinted glasses gives me a reason 
for thinking I am wearing blue-tinted glasses. Fred’s later looking blue gives me a 
reason for thinking the world has changed and Fred has become blue. However, 
my knowledge about the blue-tinted glasses defeats the inference to the conclusion 
that Fred is blue, reinstating the conclusion that Fred is red. OSCAR’s reasoning is 
diagrammed by figure 6, where the “fuzzy” arrows indicate defeat relations. 2 



A (It appears to me that the color of Fred is red) at 1 

^ (It appears to me that the 
color of Fred is blue) at 30 



i The color of 
Fred is red 



■ \- 

i, -"- - 



-The color of 
Fred is blue 






The color of 
A Fred is blue 

v 

. V. 



-The color of 
Fred is red 




' (It appears to me that Merrill reports that I 
am wearing blue-tinted glasses) at 20 

' (Merrill reports that I am wearing 
blue-tinted glasses) at 20 

Merrill is a reliable informant 

I am wearing blue-tinted glasses at 20 



Fred's appearing blue is not a reliable 
indicator of Fred's being blue when I am 
wearing blue-tinted glasses 



((It appears to me that the color of Fred is blue) 
at 30) (*) (The color of Fred is blue) 



Figure 6. Inference graph 



For more details on this, see [9]. 
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For a rational agent to be able to construct plans for making the environment 
more to its liking, it must be able to reason causally, in particular, it must be able 
to reason its way through the frame problem. OSCAR implements a solution to the 
frame problem. It has three constituents: 

TEMPORAL-PROJECTION, discussed above. 

CAUSAL-IMPLICATION, which allows us to make inferences on the basis of 
causal knowledge: 

If t* > t, “A-at-t and P-at-t and ( A when P is causally-sufficient for Q)” is a 
defeasible reason for “Q-at-f*”. 

CAUSAL-UNDERCUTTER, which tells us that inferences based on causal knowl- 
edge take precedence over inferences based on temporal projection: 

If t 0 < t < t*, “A-at-t and P-at-t and (A when P is causally-sufficient for ~Q)” is 
an undercutting defeater for the inference from O-at-L to (Law by TEMPORAL- 
PROJECTION. 




Jones is alive at 20 



Figure 7. The Yale Shooting Problem 

These principles can be illustrated by applying them to the Yale shooting 
problem [2]. I know that the gun being fired while loaded will cause Jones to 
become dead. I know that the gun is initially loaded, and Jones is initially alive. 
Later, the gun is fired. Should I conclude that Jones becomes dead? Yes, defeasibly. 
OSCAR solves this problem by reasoning as in figure 7. By TEMPORAL- 
PROJECTION, OSCAR has a reason to think that Jones will be alive. By CAUSAL- 
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IMPLICATION, OSCAR has a reason to think that lones will be dead. By CAUSAL- 
UNDERCUTTER, the latter takes precedence, defeating the former. 3 



4. Practical Cognition 

Given an agent capable of sophisticated epistemic cognition, how can it make use 
of that in practical cognition? We can regard practical cognition as having four 
components: goal selection, plan-construction, plan- selection, plan-execution. Al- 
though it is natural to think of these as components of practical cognition, most of 
the work will be carried out by epistemic cognition. To illustrate this, I will focus 
on plan-construction. 

Standard planning algorithms assume that we come to the planning problem 
with all the knowledge needed to solve it. This assumption fails for autonomous 
rational agents. The more complex the environment, the more the agent will have 
to be self-sufficient for knowledge acquisition. The principal function of epistemic 
cognition is to provide the information needed for practical cognition. As such, the 
course of epistemic cognition is driven by practical interests. Rather than coming 
to the planning problem equipped with all the knowledge required for its solution, 
the planning problem itself directs epistemic cognition, focusing epistemic endeavors 
on the pursuit of information that will be helpful in solving current planning 
problems. Paramount among the information an agent may have to acquire in the 
course of planning is knowledge about the consequences of actions under various 
circumstances. Sometimes this knowledge can be acquired by reasoning from 
what is already known. But often it will require empirical investigation. Empirical 
investigation involves acting, and figuring out what actions to perform requires 
further planning. So planning drives epistemic investigation, which may in turn 
drive further planning. In autonomous rational agents operating in a complex 
environment, planning and epistemic investigation must be interleaved. 

I assume that rational agents will engage in some form of goal-regression 
planning. This involves reasoning backwards from goals to subgoals whose achieve- 
ment will enable an action to achieve a goal. Such reasoning proceeds in terms of 
causal knowledge of the form "performing action A under circumstances C is 
causally sufficient for achieving goal G”. This is symbolized by the planning- 
conditional (A/C) => G. 

A generally recognized problem for goal-regression planning is that subgoals 
are typically conjunctions. We usually lack causal knowledge pertaining directly 
to conjunctions, and must instead use causal knowledge pertaining to the individual 
conjuncts. We plan separately for the conjuncts of a conjunctive subgoal. When 
we merge the plans for the conjuncts, we must ensure that the separate plans do not 
destructively interfere with each other (we must “resolve threats”). Conventional 
planners assume that the planner already knows the consequences of actions under 
all circumstances, and so destructive interference can be detected by just checking 



For more details on this, see [9], 
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the consequences. However, an autonomous rational agent may have to engage in 
arbitrarily much epistemic investigation to detect destructive interference. Even if 
threats could be detected simply by first-order deductive reasoning, the set of 
threats would not be recursive. The following theorem is proven in Pollock [10]: 

If the set of threats is not recursive, then the set of planning (problem, solution) 
pairs is not recursively enumerable. 

Corollary: A planner that insists upon ruling out threats before merging plans 
for the conjuncts of a conjunctive goal may never terminate. 

If the set of threats is not recursive, a planner must operate defeasibly, assuming 
that there are no threats unless it has reason to believe otherwise. That a plan will 
achieve a goal is a factual matter, of the sort normally addressed by epistemic 
cognition. So we can perform plan-search by adopting a set of defeasible reason- 
schemas for reasoning about plans. The following are examples of such reason- 
schemas: 

GOAL-REGRESSION 

Given an interest in finding a plan for achieving G-at-t, adopt interest in finding 
a planning-conditional (A/C) => G. Given such a conditional, adopt interest in 
finding a plan for achieving C-at-f*. If it is concluded that a plan subplan will 
achieve C-at-U, construct a plan by (1) adding a new step to the end of subplan 
where the new step prescribes the action A -at-?*, (2) adding a constraint ( t * < t) 
to the ordering-constraints of subplan , and (3) adjusting the causal-links ap- 
propriately. Infer defeasibly that the new plan will achieve G-at-t. 

SPLIT-CONJUNCTIVE-GOAL 

Given an interest in finding a plan for achieving (G l -at-t l & G,-at-C), adopt 
interest in finding plans plan l for Gj-at-t, and plan 2 for G 2 -at-f 2 . Given such 
plans, infer defeasibly that the result of merging them will achieve (Gj-at-f, & 
G 2 -at-? 2 ). 



A number of additional reason-schemas are also required, but a complete planner 
can be constructed in this way. To illustrate, consider Pednault’s [6] briefcase 
example. My briefcase and paycheck are initially at home, and the paycheck is in 
the briefcase. My goal is to have the briefcase at the office but the paycheck at 
home. OSCAR begins by producing the plan diagrammed in figure 8. This is a 
flawed plan, because taking the briefcase to the office also takes the paycheck to 
the office. Having produced this plan defeasibly, OSCAR undertakes a search for 
defeaters, and finds the defeating subplan of figure 9. OSCAR then fixes the 
flawed plan by adding a step that defeats the defeater, as in figure 10. The final 
plan is then that of figure 1 1 . 






Figure 9. Defeating subplan 
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Figure 10. Defeating the defeater 
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Figure 11. The final plan 
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Given a recursive list of consequences for each action, this planner will 
produce the same plans as conventional planners like UCPOP [7] or PRODIGY 
[19]. But when the list of consequences is nonrecursive the conventional planners 
will not return solutions, whereas OSCAR will still return the same solutions 
defeasibly. The search for defeaters may never terminate, but when the agent must 
act it can do so using its currently justified conclusions, which include the plans 
that were found defeasibly. 



5. Adopting Plans 

Thus far, everything that I have talked about has been implemented. Now I come 
to work in progress, and this is not implemented. 

Plan construction produces plans that purport to achieve their goals, but 
adopting such a plan requires a further cognitive step. Such a plan is not automatically 
adoptable. For example, its execution costs might be greater than the value of the 
goal achieved. Or it might interact adversely with other plans already adopted, 
increasing their execution costs or lowering the value of their goals. Sometimes 
the impacted plan should be rejected, and sometimes the new plan should be 
rejected. In deciding whether to adopt a plan, we must evaluate it in a roughly 
decision-theoretic manner. This might suggest the use of Markov-decision planning 
(MDP’s). However, a generally recognized problem for MDP’s is computational 
infeasibility in complex domains. Although a lot of current research is directed at 
alleviating this problem [1], I am betting that there is no ultimate solution to it. So 
we must look for another way of doing decision-theoretic planning. 

My proposal is that it is possible to perform feasible decision-theoretic planning 
by modifying conventional goal-regression planning in certain ways. Goal-regression 
planning can be performed by applying classical planning algorithms but appealing 
to probabilistic connections rather than exceptionless causal connections. This is 
computationally easier than standard “probabilistic planning” in the style, e.g., of 
BURIDAN [5], which uses probabilities to drive the planning. On my proposal, 
the planning is done conventionally and then probabilities computed later. In this 
connection, it is important to realize that it isn’t really the probability of the plan 
achieving its goals that is important — it is the expected value. The expected 
value can be high even with a low probability if the goals are sufficiently valuable. 

Once a plan is constructed, an expected value can be computed. This compu- 
tation can be simplified by doing it defeasibly. An initial computation can be made 
using just the probabilities of plan-steps having their desired outcomes in isolation. 
Then a search can be undertaken for conditions established by other steps of the 
plan that alter the probabilities. This is analogous to the search for threats in 
deterministic causal-link planning. 

Similarly, the initial computation uses default values for the execution costs 
of plan steps and the values of goals. However, these values can be different in the 
context of conditions established by other steps of the plan. Clearly, earlier steps 
can change the execution costs of later steps. For example, if a step transports a 
package from one point to another by truck, and an earlier step moves the truck, 
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that can change the execution cost. Goals don’t have fixed values either. For 
example, suppose my goal is to have a dish of vanilla ice cream, and a playful 
friend offers to give me one if I eat a dill pickle first. I may then construct the plan 
to eat a dill pickle in order to obtain the dish of ice cream. However, the value of 
the goal is greatly diminished by eating the pickle. So a search must be undertaken 
for conditions that alter execution costs and the values of goals. This is also 
analogous to the search for threats. 

Obviously, expected values can be increased by adding conditional steps. 
Less obviously, expected values can be increased by planning hierarchically. A 
high-level plan can have a higher expected value than any of its low-level spec- 
ifications. This is because we may be confident of being able to fix low level plans 
if they misfire. For example, 1 can be more confident that 1 will be able to fly to 
LA than I am that 1 can fly to LA on any particular flight. 

The preceding computation computes the expected value of the plan in isola- 
tion. But that is not the relevant expected value. The agent may have adopted 
other plans whose execution will change the context and hence change both the 
probabilities and values used in computing expected values. Let the agent’s master 
plan be the result of merging all of the agent’s local plans into a single plan. In 
deciding whether to adopt a new plan, what is really at issue is the effect that will 
have on the expected value of the master plan. Changes to the master plan may 
consist of simultaneously adopting and withdrawing several plans. It is changes 
that must be evaluated decision-theoretically. The value of a change is the difference 
between the expected value of the master plan after the change and its expected 
value before the change. This is the differential expected value of the change. 

In a realistic agent in a complex environment, the master plan may grow 
very large. It is important to be able to employ simple defeasible computations of 
expected value. It can be assumed defeasibly that different local plans are evaluatively 
independent , in the sense that the expected value of the combined plan is the sum 
of the expected values of the individual plans. This makes it easy to compute 
differential expected values defeasibly. The search for considerations that would 
make this defeasible assumption incorrect is precisely the same as the search 
described above for considerations within a plan that would change the defeasible 
computation of its expected value. The only difference is that we look for consider- 
ations established by other constituents of the master plan. 

Conventional decision theory would tell us to choose a master plan having a 
maximal expected value. That is at least computationally infeasible in complex 
domains. There may not even be a maximally good plan. In many domains it may 
be that we can always improve the master plan marginally by adding more local 
plans. Instead of maximizing we must satisfice — seek plans with positive expected 
values, and always maintain an interest in finding better plans. A plan is defeasibly 
adoptable if it has a positive expected value, or if its addition to the master plan 
increases the value of the latter. The adoption is defeated by finding another plan 
that can be added to the master plan in its place and will increase the value of the 
master plan further. So we are always on the lookout for better plans, but we are 
not searching for a single “best” plan. 
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6. Conclusions 

• An architecture for “anthropomorphic agents” must mimic (but not necessarily 
duplicate) human rational cognition. 

• Practical cognition makes choices based upon information supplied by epistemic 
cognition. 

• Most of the work in rational cognition is carried out by epistemic cognition, and 
must be done defeasibly. 

• OSCAR implements a sophisticated system of defeasible reasoning that enables 
it to deal defeasibly with perception, change and persistence, causation, proba- 
bilities, etc. 

• Sophisticated agents operating in complex environments cannot plan by using 
conventional planning algorithms that produce r.e. sets of solutions. 

• However, the ideas underlying conventional planning algorithms can be resur- 
rected as defeasible principles for reasoning about plans. 

• Defeasible principles of deterministic planning can be generalized to produce 
defeasible principles of decision-theoretic planning. 

• In decision-theoretic planning, decisions about whether to adopt new plans (and 
perhaps to reject previously adopted plans) must be made on the basis of the 
effect that has on the expected value of the master plan. 

• An efficient computation of the expected value of the master plan can be done 
defeasibly. 
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Abstract. In this paper we consider an environment which consists of one 
broadcasting entity (producer) which broadcasts information to a large number 
of personal computer users, who can down-load information to their PC disks 
(consumers). We concentrate on the most critical phase of the broadcasting 
system operation, which is the characterization of the users’ needs in order to 
maximize the efficiency of the broadcast information. Since the broadcasting 
system can not consider each user in isolation, it has to consider certain 
communities of users. We have proposed using a hierarchic distributed model 
of software agents to facilitate receiving feedback from the users by the 
broadcasting system. These agents cluster the system’s users into communities 
with similar interest domains. Subsequently, these agents calculate a 
representative profile for each community. Finally, the broadcasting agent 
builds an appropriate broadcasting program for each community. We developed 
a simulation of the broadcasting environment in order to evaluate and analyze 
the performance of our proposed model and techniques. The simulation results 
support our hypothesis that our techniques provide broadcasting programs, 
which are of great interest to the users. 



1 Introduction 

Data dissemination, which is the delivery of data from a set of producers to a larger 
set of consumers, has attracted a lot of attention recently. This is because of the 
ongoing advances in communications, including the proliferation of the Internet, the 
development of wireless networks, and the impending availability of high bandwidth 
links to the home which enables information broadcasting. In this paper, an agents’ 
model and algorithms that handle the particular problems associated with data 
dissemination that arise in information-broadcasting environments are presented. 

The broadcasting system we refer to consists of one broadcasting entity which 
broadcasts multi-media information to a large number of personal computer users, 
who can down-load information to their PC disks. 

The most critical requirement of the broadcasting system is receiving feedback 
from the users on their specific interest domains in order to broadcast information that 
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satisfies the users’ needs [Franklin and Zdonik 1996]. In order to enable the 
broadcasting system to receive the feedback, we shall assume that there is a way to 
transmit information from a user’s PC to the broadcasting system by means of a low 
communication capacity. 

Our research focuses on the most important phase of the broadcasting system 
operation, which is the integration of all individual users’ needs in order to maximize 
the efficiency of the broadcasting data. Since the broadcasting system cannot consider 
each user in isolation, it will have to consider a certain community of users. 
Consequently, the broadcasting system will have to make a certain generalization and 
integration of users’ needs which will maximize both the satisfaction of individuals, 
as well as that of the general user community. The output of this process is termed the 
integrated users profile. We have designed and developed a software agents' model 
and algorithms to facilitate the receiving of feedback from the users by the 
broadcasting system and introduced the concept of the integrated users’ profile, which 
is an information model that represents the overall users' interests. 

We suggest that there will be three general types of agents in the broadcasting 
framework: 

1 User’s Agents- There are three resident agents at the user PC. A user profile 
agent, a filtering agent, and a browser agent. A user profile agent accurately 
captures and characterizes its user’s interest domains by relying on past 
experience. The filtering agent dynamically chooses the broadcasting 
channels that most suit its user and determines which documents to 
download. The browser agent’s task is to help the user in browsing 
information after it has been downloaded to its disk on his/her personal 
computer. 

2 Integrated Users Profile Agents- The integrated users profile agents divide 
the user community into groups of the same interest domains and construct a 
community profile for each group of users. 

3 Broadcasting Agent- The broadcasting agent monitors the broadcasting 
program, based on the integrated users profile. 

Section 2 describes the representation of a multimedia document and the user 
profile representation in the broadcasting environment. Section 3 presents the 
algorithms and the approaches we adopted to solve the problem of the integrated users 
profile. The algorithms that the broadcasting agent uses to perform its tasks are 
described in section 4. Section 5 describes the simulation of the broadcasting 
environment, and presents the tests and results. Finally, the conclusions are presented 
in section 6. 



2 The Main Data Structures of the Broadcasting Environment 



The structure of a multimedia document’s representation is a hierarchical structure. 
The design of the document representation was motivated by the Newt system [Seth 
1994], which uses a similar hierarchic structure. The document representation 
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structure consists of two levels. The highest level consists of the characteristic fields, 
for example, the source of the document field, the author field, the class field, the 
subclass field, the multimedia type which the document contains, the keywords field, 
and the size of the document field. Most of the fields (except the size field which does 
not point to a weighted keyword vector) have a pointer to a weighted keyword vector 
which appears in the second level hierarchy of the representation structure and 
consists of pairs of term and weight that reflect the document content. The sum of the 
weights in each vector is 1 . 

Tremendous effort has been exerted with respect to the task of building user 
profiles. The motivation to build a user profile arises in many areas, such as filtering 
systems [Lang 1995], retrieval systems [Tomsic et al. 1994], browsing agents 
[Hammond et al. 1994], and on-demand systems [Nygren et al. 1996]. Many systems 
have been developed to combine machine learning techniques with information 
filtering techniques to create a user profile [Edwards et al. 1996, Maes and Kozierok 
1993, Hammond et al. 1994, Lang 1995], 

Software agents seem to be a good solution to the problem of building a user 
profile [Edwards et al. 1996, Sheth 1994]. These software agents should be 
responsible for keeping track of and adapting to the changeable interests of the user. 
We did not concentrate on the development of a user profile agent, based on the 
assumption that the user profile agent in the broadcasting system can use some of the 
already existing technologies [Sheth 1994, Shardanand and Maes 1995, Edwards et al. 
1996, Moukas 1996]. However, we are interested in the development of user profile 
representation, the integration of a user profile’s set to one or more profiles and the 
usage of these profiles in the task of preparing a broadcasting program. 

The user profile structure in our system is similar to the document representation 
and is also a hierarchic structure. It can be seen as a tree structure. Each branch in the 
tree actually characterizes the interest of a user in a certain domain. There are four 
levels in each branch of the tree. These levels gradually extend the description of each 
interest domain. It is not necessary that all the users of the system have exactly the 
same structure. The four levels of the user profile structure are: 

1 . Class level; 

2. Subclass level; 

3. Description level; 

4. Term level. 

Each node in the profile structure is a vector. A vector consists of fields. Each field 
contains its field-name (string), its weight, and a pointer to a node in the level below 
in the tree. The sum of the weights in each vector is 1 (see Figure 2). The weights of 
some fields can be zero. 
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Fig. 1. The representation of a user profile. 



3 Integrated Users Profile Agents 

The motivation to build an integrated users’ profile emanates from the fact that we are 
dealing with a broadcasting system. In a broadcasting environment it is unreasonable 
to consider each user/client in isolation. Instead, we need to consider each individual 
user profile with respect to a certain community. In spite of the generalization of 
users’ preferences, the main object of the broadcasting system is to maximize the 
satisfaction of individuals, as well as the general welfare. 

An integrated users profile is a set of community profiles whose structures are 
identical to the user profile’s structure. Each community profile represents the 
preferences of a user community. In addition, it has to be as close as possible to each 
of the user profiles that belong to the community. The main characteristic of the 
community is that all its members have a similar distribution of interest domains. 
Building an integrated users profile is the most critical phase in our broadcasting data 
system. The algorithm that the integrated users profile agents use is based on concepts 
from social information filtering algorithms typically used in a recommendation 
system [Shardanand and Maes 1995, Nygren et al. 1996]. 

The architecture of integrated users profile agents are described in the following 
subsections. 



3.1 The Architecture of the Integrated Users Profile Agents 

The architecture of the integrated users profile agents is distributed and has a 
hierarchical structure (see Figure 2). It consists of three types of agents: clustering 
agents, a union set agent, and community agents. Each type of agent is located at a 
different level of the hierarchy. 
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• Clustering Agent: Divides all the users, which belong to its geographical 
area, into groups of similar taste (that is, they have similar user profiles). 

• Union Set Agent: Unifies similar groups from different regions. 

• Community Agent: Builds a community profile on behalf of its 
community’s members. 

In general, we divide the geographic area of all the broadcasting system’s users 
into regions. Then we assign a clustering agent to each of the regions. A clustering 
agent will receive a set of individual user profiles from all the users in its region. The 
role of a clustering agent is to split its users into communities, which are groups of 
users with similar taste and same subjects of interest. As was described in section 2, 
the user profile structure consists of four description levels. At the highest level is the 
class vector that indicates the weights that each class (subject of interest) received. A 
clustering agent may rely on the weighed class vector of each user profile (see Figure 
1) or may use the information from additional levels for generating the similarity 
between each two users. Based on the inter-user similarities, the clustering agent 
divides its users into communities by using a clustering method which we propose 
and discuss in the following section [Salton and McGill 1983, Rasmussen 1992, 
Rijsbergen 1979], 

After splitting the users into communities, the clustering agent finds a 
representative weighted classes vector for each of the generated communities. It then 
transmits all the representative weighted classes vectors (or more complicated 
structures consisting of more than one level) of the communities it generated to the 
union set agent.. The union set agent, in turn, will unify communities from different 
regions, which are similar, using a clustering method. In the next phase, the union set 
agent will send the final set of communities to all the clustering agents. That is, for 
each of the community representatives that the clustering agent found, the union set 
agent will send a new community representative that corresponds to the old one and 
represents a number of communities from different regions. Then, each clustering 
agent will send the complete user profiles of each user from its region to the 
appropriate community agent. Each community agent will then calculate a complete 
representative profile of its users’ community using our proposed algorithm and will 
send it to the broadcasting agent. Figure 2 illustrates the hierarchic structure of the 
integrated users profile agents and the interactions between the variant types of 
agents. 



3.2 Clustering Agents 

Each clustering agent will be responsible for a certain region of the broadcasting 
system’s users. The set of clustering agents distributes the task of collecting the user 
profiles from all the users and the task of splitting the users into relatively small 
groups with similar taste. The clustering agent receives a complete user profile from 
all its users. 

Many document-clustering algorithms are available. These document-clustering 
algorithms are mainly used in the area of filtering and retrieval systems. In many 
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Fig. 2. The architecture of the integrated users' profile agents. 

information-filtering systems, a document is represented as a weighted keyword 
vector. Since the user profile structure is based on the weighted keyword vector 
model, we can also use some of available document classification techniques to 
classify the set of user profiles. The performance of six different clustering algorithms 
for the formation of users’ communities was examined. Our goal was to identify the 
ones that are best for creating communities of users. 

We have tested the following clustering algorithms for the task of forming user 
communities: 

1 . Single link [Rasmussenn 1992], 

2. Complete link [Rasmussenn 1992]. 

3. Group average [Rasmussenn 1992]. 

4. Centroid Method [Rasmussenn 1992]. 

5. Mutual nearest neighbor [Gowda and Krishna 1978]. 

6. Shared nearest neighbor [Jarvis and Patrick 1973]. 

Most of the clustering methods that we examined on the space of user profiles were 
unable to obtain satisfactory results. That is, with the exception of the complete link 
method, all the clustering algorithms, which we examined, resulted in one large 
cluster and the rest of the clusters contained one or two isolated user profiles. In 
contrast, the complete link obtained successful results. That is, the complete link 
produces tightly bound clusters in which the clusters sizes are normally distributed. 

The time complexity of the complete link algorithm, using the adjacency list 
implementation developed by Ejgenberg and Lindel [Ejgenberg and Lindel 1997], is: 
Of if log n) , where n is the number of user profiles. More details about the 
algorithm we used is available in [David 1998]. 
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3.3 The Union Set Agent and the Community Agents 

The union set agent belongs to the second hierarchy level of the integrated users 
profile agents (see Figure 2). The union set agent gets a set of communities from each 
of the clustering agents. The role of the union set agent is to unify communities from 
different regions, which are very similar. By unifying very close community profiles 
from different regions we promote the efficiency of the broadcasting system by using 
the bandwidth capacity more efficiently. The bandwidth designated for the 
broadcasting program is limited and is split among the communities. By unifying 
communities with very similar profile we eliminate redundant communities. 
Consequently, we increase the bandwidth of each community. 

The input of a community agent is a set of user profiles which it receives from all 
the clustering agents. The representative profile of the community is its center. We 
define the centroid of the community to be the average profile of the community. The 
weight of each field in the community profile is comprised of the average relative 
weights of all the community members’ fields. The community agent considers the 
weights in higher levels as a factor when calculating the weight of each field. 



3.4 Related Work 

The concept of generalization of user profiles has been introduced in other 
applications for different purposes. Bruce Krulwich [Krulwich 1997] presented a 
method for generating user profiles that takes advantage of a large-scale database of 
demographic data. In the broadcasting system there is no need to generalize the user 
profile, but rather to compute a community profile, which generalizes and integrates 
the interests of all the community's members. Our agents do not need outside data to 
compute a community profile, and unlike Krulwich's approach that is based on 
predetermined clusters, the broadcasting system computes the clusters of user profiles 
dynamically by applying a clustering method. These clusters form the communities of 
the broadcasting system's users. 

Another method that uses the concept of generalization of user profiles for 
constructing user communities was proposed by [Paliouras et al. 1998], Their method 
is based on unsupervised learning methods, which are called conceptual clustering. In 
contrast, our method is based on clustering algorithms from information retrieval. 
Their main purpose was to construct communities which are significantly different 
from each other, while the main purpose of the communities construction task in our 
system is to help the broadcasting system decide which information to broadcast and 
which not. From our experience, communities of user population are not significantly 
different. That is the reason why many clustering algorithms that we examined failed 
to form balanced clusters, and we believe that this is the reason why the results of 
Paliouras' experience were quite disappointing. 
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4 The Broadcasting Agent 

On the top of the information flow from the users to the broadcasting system is the 
broadcasting agent. The broadcasting agent receives the integrated users profile that is 
the processed feedback sent by the users of the broadcasting system. Based on the 
information the broadcasting agent receives, it should prepare a broadcasting program 
which will maximize the satisfaction of the users community. Aside from the 
broadcasting program, the broadcasting agent should handle the document collection, 
including the extraction of the document representation, insertion of new documents 
into the correct location, and removal of useless documents. The task of the 
broadcasting agent can be distributed among a group of agents, each specializing in 
performing a specific subtask. The tasks we consider in our current research are: 

♦ Extracting the multimedia document representation based on the information 
that the multimedia document is accompanied by. (The manner in which the 
broadcasting agent performs this task is described in [David 1998].) 

♦ Scoring all the documents in the broadcasting document collection and 
preparing a broadcasting program for each community. 

The broadcasting agent measures the similarity between the community profiles and 
the document collection. Different communities rate the documents differently. 
Therefore, for each document in the document collection, the broadcasting agent 
prepares a vector of scores with respect to each community profile. In other words, 
for each document we have a vector of scores where score Si is the score that the 
document received from community i . The function that the broadcasting agent uses 
for scoring the documents is as follows: 

Scorec ommunity _ profile , document = 



C om m unity _ Class _W eightc X 
C om m un ity _ Subclass _W eigh ts x 
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X doc _ term r x X P ro f _ term 






V k k 





where Community_Class_Weightc is the weight of class c according to the 
community profile, and Community_Subclass_Weights is the weight of subclass s of 
class c of the community profile. Com_prof_descd is the description field’s weight of 
the community profile. D is the number of description fields in the description vector 
of the profile and K is the number of terms in the term vector that is the fourth level of 
the profile and the second level of the document representation. The complexity of the 
scoring phase is O (L-M), where L is the number of communities, and M is the 
number of documents in the collection of the broadcasting system. 

The consequent phase of the scoring phase, involves the preparation of a 
broadcasting program for each community. The input of the broadcasting program 
preparation process includes the scores of the document collection, a set of 
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community profiles, and the broadcasting capacity (if there is a significant restriction 
on the broadcasting capacity). The output of the broadcasting agent is the 
broadcasting program, which includes a set of documents which were selected for 
broadcasting by the algorithm, and the community programs, which are lists of 
document I.D.s for each community. 

The task of preparing the broadcasting program includes five main steps: 

♦ Dividing the broadcasting capacity among the communities, 
proportionally to their sizes. 

♦ Selecting the highest scored documents for each community, which fill 
their portion of the broadcasting capacity. 

♦ Returning to step 1 and distributing capacity freed due to repetition, until 
no new document is selected. 

♦ Allocating the remaining capacity to the largest community and selecting 
the next highest scoring documents. 

♦ Identifying additional relevant documents for each community in the 
broadcasting program. 

The overall complexity of preparing a broadcasting program is 
0(L M ■ C 2 log C) + O (M • C log C) + 0(L • C) = 

0(L- M • C 2 log C) 



where : 

♦ L: number of communities. 

♦ M: number of documents in the document collection. 

♦ C: number of documents in the broadcasting capacity. 

For more details see [David 1998J. 



5 Simulation, Tests, and Results 

A broadcasting environment is composed of two main components: information 
which is broadcast and users that consume the broadcasting information. In order to 
evaluate and analyze the performance of our proposed model and techniques, we need 
to simulate the broadcasting system’s environment, since, currently, a system that 
broadcasts to users' PCs does not exist. The simulation of a broadcasting system’s 
environment consists of the following two main components: 

1 . A document collection. 

2 . A set of user profiles. 

We used a static document collection in order to analyze the performance of the 
algorithms we proposed for the integrated users profile agents. A set of 1400 
downloaded documents from the USENET serves as our static document collection. 
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As we mention above, currently, a broadcasting system like the system we consider 
in this research, does not yet exist. Therefore, no real data on the preferences or the 
distribution of preferences among the users is available. We decided to perform a poll 
in such a manner that each questionnaire will result in a user profile. The users (200 
users) were asked to rate subjects and sub-subjects of interests on the scale of 1-5. 

We simulate a set of user profiles based on the poll results. The set of user profiles 
of the poll, serves as the initial set of user profiles 2 . Since we wanted to analyze the 
algorithm by using different and larger sets of users. We generated new populations 
by applying a genetic algorithm to the initial set of user profiles, using mutation and 
crossover [Goldberg 1989, Mitchell 1997]. 

We propose using the recall and precision measurements [Salton and McGill 1983] 
in order to evaluate the performance of the overall broadcasting system that we 
developed. The relevancy of a certain document for a given user was calculated using 
the following relevancy function and the relevancy threshold. 



R e lev an cy pro file, document = 

P r o fil e _ C la s s _ W e ig h tc x 
P r o file _ Subclass _WeightsX 

d o c _ te rm i x prof _ term i 



2 2 
X d o c _ te r m t X X prof _ term t 



All tests were applied to three different sets of users. The first set is the set of 200 
original user profiles from the poll. Each of the other two sets contains 1000 user 
profiles. Applying the crossover and the mutation operations on the set of 200 user 
profiles from the poll generated Set 1 . The probability value was fixed at 0.04. This is 
the probability of changing the value of a certain field according the mutation 
operation. The offset of the changes was 0.1. In order to generate Set 2, we applied 
only the crossover operation on the set of 200 user profiles from the poll. No value 
was changed, only the crossover operation influenced the resulting user profiles. As 
expected, the 1000 users of Set 2 formed more centralized and high density 
communities, while the communities formed on Set 1 were more distributed and less 
centralized. The results show that the more centralized the communities, the better the 
results. That is, the user satisfaction increases. Thus we can conclude that if the users 
form highly dense and centralized communities, as we assume the users domain of the 
broadcasting system does, the algorithms we propose perform the best. Each of the 
following subsections discusses one test. This includes the aim of the test, the sets of 
users used for the test, the results of the test, and the conclusions. 



2 The poll collected information on the first two levels of the profile. The next two levels of the 
user profiles were generated by the simulation, using normal distribution on the weight in the 
interval [0-1]. 
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5.1 The Effect of Dividing the Users into More Communities 

How does the clustering process effect the results of the algorithm? Is it really worth 
dividing the users into communities or will user satisfaction, perhaps, increase when 
the users are not divided into communities? In order to analyze this aspect, we ran a 
simulation with the domains of users from the poll. Set 1 , and Set 2, and in each run 
we increase the number of communities that the algorithm would form. 
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Fig. 3. The effect of dividing the users into more communities (users from the poll ). 
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Fig. 4. The effect of dividing the users into more communities (Set 1). 
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Fig. 5. The effect of dividing the users into more communities (Set 2). 



The specific resulting values are strongly influenced by the following parameters: 
the set of documents; the set of user profiles; the similarity between the user profiles 
and the documents; and the inter-user similarity. In each environment, the average 
number of relevant documents per user is different. The values of recall and precision 
increase with the average number of relevant documents and with the inter-user 
similarity. 
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The recall in the three tests decreases as the number of communities increases 
(Figures 3, 4, and 5). However, it is rather stable from 20 communities and on. On the 
other hand, the precision increases with the number of communities. We see that the 
behavior of the recall and precision is similar in the three tests. This demonstrates that 
the system is stable under different sets of user profiles. However, there is a slight 
difference between the results of the three tests, resulting from the fact that they are 
performed in slightly different environments. The conclusion is that dividing the users 
into more communities is beneficial since the recall barely changes (and sometimes 
even increases), while the precision increases. In other words, the number of relevant 
documents a user receives remains almost the same, while the number of irrelevant 
documents decreases. 



5.2 The Effect of Additional Users 

What is the effect of adding more users to the system under the same constraints (e.g. 
the same broadcasting system capacity). In each set we examine the system with 200, 
400, 600, 800, and 1000 users. In all the runs we divided the users into 20 
communities, the broadcasting capacity was fixed to 300 documents, and the 
relevancy threshold was the same. Then we calculated the average recall and average 
precision of a certain number of users. The results are presented in the following 
figures and tables. 
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Fig. 6. Testing the effect of additional users in the system (Set 1). 
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Fig. 7. Testing the effect of additional users in the system (Set 2). 

The results are satisfying since they demonstrate that adding more users to the 
system has almost no affect on the performance; namely, the recall and precision 
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values remain quite the same, while the number of users increase. We hypothesize 
that it is beneficial to have more and more users since the profit of the system 
drastically increases, while the individuals’ loss is negligible, however, more 
simulation with larger number of users is needed. This is a very significant result for a 
broadcasting environment which will, most probably, serve many users. 



5.3 The Effect of Identifying Additional Relevant Documents 

This test examines the effect of adding the procedure that identifies additional 
relevant documents selected for broadcasting, but which have not yet been assigned to 
this community (see Section 4). Does this extension of the community programs 
improve the results? In one series of runs we disable the procedure that identifies 
additional relevant documents, while in the second series we enable it. 
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Fig. 8. The recall (Set 1). 
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Fig. 9. The precision ( Set 1). 

The results are quite satisfying. Adding relevant documents selected for 
broadcasting, but not yet assigned to this community, significantly increases the recall 
(Figure 8), while the precision is hardly affected. In spite of the extension, the recall 
decreases as the number of communities increases. Nevertheless, the recall’s values 
when the extension is enabled are greater than when the extension is disabled. That is, 
the users attain more relevant documents when the extension is enabled. Another 
advantage of enabling the extension is that the recall remains somewhat stable in the 
rage of 20 communities and more, since the number of documents is not limited, 
while there are relevant documents, which were chosen to be broadcast. In contrast, 
when the extension is disabled, the recall always decreases as the number of 






104 E. David and S. Kraus 



communities increases, resulting from the fact that the broadcasting capacity portion 
of each community decreases as the number of communities increases. We presented 
here just the results of Set 1, but similar results were achieved on the other two sets of 
users (the user profiles from the poll and Set 2). 



6 Conclusion 

The broadcasting system we refer to consists of one broadcasting entity which 
broadcasts information to a large number of personal computer users, who can down- 
load information to their PC disks. In particular, we concentrate on broadcasting 
multimedia information to PCs. 

The main concern of this paper is to show how the broadcasting system can decide 
which information to broadcast in order to satisfy the users’ needs and interests. The 
initial answer is that the system needs to receive feedback from the users on their 
specific interest domains. We have designed a distributed software agent system that 
will enable the broadcasting system to receive feedback from the users. 

The simulation results support our hypothesis that these algorithms provide 
broadcasting programs which are of great interest to the users. 



Bibliography 

1 . E. David. 1998. Agents for Information Broadcasting. Master’s thesis. Department of 
Computer Science, Bar-Ilan University. 

2. P. Edwards, D. Bayer, C. L. Green and T. R. Payne. 1996. Experience with learning agents 
which manage Internet-based information. AAAI Spring Symposium on Machine learning 
in IA, Scotland. 

3. Y. Ejgenberg and Y. Lindel. 1997. B.S.c project. Computer Science Department at Bar 
Ilan University. 

4. M. J. Franklin and S. Zdonik. 1996. Dissemination-based information systems. IEEE Data 
Engineering Bulletin, Vol. 19, No. 3, pp. 20-30. 

5. K. C. Gowda and G. Krishna. 1978. Agglomerative clustering using the concept of mutual 
nearest neighborhood. Pattern Recognition ,Vol. 10, No 2, pp. 105-1 12. 

6. D. E. Goldberg. 1989. Genetic Algorithm in Search, Optimization and Machine Learning. 
Addison-Wesley. 

7. K. J. Hammond, R. Burk, and K. Schmitt. 1994. A case-based approach to knowledge 
navigation. In Proceedings of AAAI Workshop on Indexing and Reuse in Multimedia 
Systems, pp. 45-57. 

8. R. A. Jarvis and E. A. Patrick. 1973. Clustering using a similarity measure based on shared 
near neighbors. IEEE transactions on computer, vol. C-22, No. 11, pp. 1025-1034. 

9. B. Krulwich. 1997. Lifestyle Finder, intelligent user profiling using large-scale 
demographic data. AAAI summer 1997, pp. 37-45. 

1 0. K. Lang. 1995. News Weeder: Learning to filter netnews. Proceedings ofINT Conference 
of Machine Learning, PP. 331-339. 

1 1 . P. Maes and R. Kozierok. 1993. Learning interface agents. Proceedings of AAAI-93 
Washington D.C. pp. 459-46. 




Agents for Information Broadcasting 105 

1 2. M. Mitchell. 1997. An Introduction to Genetic Algorithms. A Bradford Book, The MIT 
Press, Cambridge, Massachusetts. 

1 3. A. Moukas. 1996. Amalthaea: Information discovery and filtering using a multi-agent 
evolving ecosystem. The first international conference on the Practical Application of 
Intelligent Agents and Multi Agents Technology, pp. 421-436. 

1 4. K. Nygren, I. M. Jonsson and O. Carlvik. 1996. An agent system for media on demand 
services. The first international conference on the Practical Application of Intelligent 
Agents and Multi Agents Technology, page 437-454. 

1 5. G. Paliouras, C. Papatheodorou, V. Karkaletsis, C. Spyroulos, V. Malaveta. 1998. 
Learning User Communities for Improving the Services of Information Providers. 
Conference on Research and Advanced Technologies for Digital Libraries, Greece. 

1 6. E. Rasmussen. 1992. Information Retrieval. Data Structures and Algorithms. Editors: W. 

B. Flakes and R. Baeza-Yates Prentice Hall Inc.. Engewood Cliffs, N. J. 

1 7. C. J. Van Rijsbergen. 1979. Information Retrieval-Second Edition. Butterworth & Co 
(Publisher) LTD. 

1 8. G. Salton. and M.J. McGill. 1983. Introduction to Modern Information Retrieval. 
McGraw-Hill. 

1 9. U. Shardanand and P. Maes. 1995. Social information filtering: Algorithms for automating 
“Word of Mouth”. ACM CHl'95 MOSAIC OF CREATIVITY, pages 210-217. 

20. B. D. Sheth. 1994. A Learning Approach to Personalized Information Filtering. Master’s 
thesis, MIT Media Lab. 

21 . A. Tomsic, H. Gracia-Molina, and K. Shoens. 1994. Incremental updates of inverted lists 
for text document retrieval. ACM S1GMOND, psages 289-300. 




On the Evaluation of Agent Architectures 



Henry Hexmoor 1 , Marcus Huber 2 , Jorg P. Muller 3 , John Pollock 4 , and Donald Steiner 5 

1 Department of Computer Science, University of North Dakota, 

Grand Forks ND, U.S.A., hexmoorOcs . und . edu 
2 Intelligent Reasoning Systems and Oregon Graduate Institute of Science and Technology, 
Portland, Oregon 97291-1000, USA, marcush@cse . ogi . edu 
3 Siemens Corporate Research, Munich, Germany, 
j oerg . muellerOmchp . siemens . de 

4 Department of Philosophy, University of Arizona, Tucson, Arizona 85721, USA, 
pollock@arizona.edu 

5 Siemens Technology To Business, Berkeley, USA, 
donald. steiner@ttb . siemens . com 



1 Introduction 

By now, intelligent agents have been on the research agenda of the computer science 
community for roughly one decade. Still, control architectures for autonomous intelli- 
gent systems have been an important research issue for an even much longer time, going 
as far back as James Watt’s steam engine control mechanism based on mechanical feed- 
back. More recent work includes the development of mathematical models for control in 
the field of cybernetics (most notably, Wiener). Also, arguably one of the biggest contri- 
butions of more than forty years of research in Artificial Intelligence were methods and 
architectures aiming to describe, control, and adopt intelligent autonomous systems. 

With the success of the ATAL workshop series, research in agent architectures gained 
considerable importance, and, as a consequence of that, a large number of agent archi- 
tectures were developed and described in the literature (see [Miil96]), many of them 
in books published in the Intelligent Agents series (see [WM99] for an overview). This 
work included architectures that support autonomous hardware or software systems in 
planning towards and achieving their goals, in reacting to unforeseen events, in dealing 
with uncertainty and change, and in interacting meaningfully with other agents. 

However, despite the large number of architectures described in the literature, up to 
one year ago, there was hardly any research investigating how these architectures can 
be evaluated. A paper investigating this question from an empirical perspective was pre- 
sented in the proceedings of ATAL-98 [Miil99] . The volume at hand contains two papers 
that discuss recent results in evaluating agent architectures [WLOO] [LYOO]. It seems 
obvious that the availability of ( theoretical or empirical) methods or methodologies for 
comparing and evaluating different agent architectures is a crucial precondition for the 
practical design of agents. 

The ATAL-99 special track on the evaluation of agent architectures sought insights 
on the strengths of various architectures in different task domains. The purpose of this 
panel, which was held as a part of the special track, was thus to discuss criteria and 
approaches for evaluating and comparing agent control architectures, and to identify 
promising areas for future research on this important topic. 



N.R. Jennings and Y. Lesperance (Eds.): Intelligent Agents VI, LNAI 1757, pp. 106-116, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 
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2 Questions for the Panelists 



The panelists were asked to express their opinions on a number of questions related to 
the panel topic: 

1 . What do you consider the most essential properties and building blocks of agent 
architectures to support autonomous action in dynamic, uncertain, or real-time en- 
vironments? 

2. Are there any rules to help designers of agent systems to decide which type of ar- 
chitecture to choose for a certain application domain (e.g., reactive, hybrid, layered, 
component-based)? Are certain architectures particularly suitable / unsuitable for 
certain application domains? 

3. Does the agent architecture really matter, at all? 

4. How can the effectiveness of agent architectures be evaluated? 

5. How can we compare two agent architectures? What could benchmarks for agent 
architectures look like? 

6. Why are there so many agent architectures? 

7. Do you think there is a need / possibility for standardisation efforts at this level? 

8. What do you foresee to be the most urgent and interesting research topics in the area 
of agent architectures? 

3 Response by Henry Hexmoor 

3.1 Agent Architecture Requirements Tradeoffs 

Similar to biological system, artificial agents are generally designed to minimise time 
and effort, and to maximise performance, robustness, reliability, coherence, and auto- 
nomy. Many of the requirements (also called qualities) required of artificial agents are 
interrelated which makes quantification and evaluation difficult. We point out a few 
tradeoffs. 



Timeliness and Purposefulness tradeoff Two key parameters that affect this tradeoff are 
task dependency and nondeterminism of the domain. A problem domain is complex if the 
tasks are highly interdependent and nondeterministic. Two other parameters that affect 
the tradeoff are the agent’s anticipatory power and perceptual selectivity of the agent. An 
agent is highly sophisticated if it possesses great anticipatory power and is perceptually 
selective. Often doing a task mindfully slows the agent down. Of course, purposeful 
behaviour can save time when negative consequence due to interaction among tasks can 
be avoided. We make the following observations: 

1 . Benefits of rapid reply are more useful in simple domains than in complex domains. 

2. Benefits of agent complexity are more useful in complex domains than in simple 
domains. 
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Coordination and Timeliness tradeoff A key area that affects this tradeoff is communi- 
cation. Often coordination slows an agent. Agents can increase coordination by commu- 
nication. But communication may cost time. Sometimes, coordination improves time- 
liness. E.g., Team goals such as in the game of Soccer require high level of coordination 
and mistakes can cost time. An agent needs to assess the right level of coordination nee- 
ded and with that decide how much communication will be enough without jeopardising 
time. We make the following observations: 

1 . Coordination is increased with more communication 

2. Timeliness is lowered with more communication 

Sociability and Situation awareness tradeoff A key area that affects this tradeoff is 
communication. Sociable agents thrive on communicated to impart information about 
themselves to others and find out about others whereas goal-driven agents communi- 
cate sparingly to enhance their perception of their surrounding while minding the cost 
of communication. Situation awareness is generally increased by communication. But 
communication may cost time and may even lower situation awareness. We make the 
following observations: 

1 . Sociability is increased with more communication 

2. Communication should be used sparingly for situation awareness 

Autonomy and Failure Tolerance tradeoff Two key parameters that affect this tradeoff 
are the problem complexity and the policy for task allocation. Problem complexity is 
affected by constraints such as resources. An example of a simple domain is foraging. 
An example of a complex domain is music. Policies for task allocation are affected by 
considerations of agent role, task priority, and agent competence. We consider failure 
tolerance to be invariance to changes in agent competence. Often highly specialised and 
autonomous groups of agents are susceptible to failure and degraded performance. Of 
course, increased Autonomy in terms of decentralised tasks and dynamic role assignment 
among agents helps maintain a constant level of group performance. The agent can 
monitor its satisfaction of internal qualities for autonomy and failure tolerance as well 
as its contribution at the system level. We make the following observations: 

1 . In simple domains, independent agents have superior failure tolerance than coope- 
rative agents. 

2. In complex domains, independent agents have inferior superior failure tolerance 
than cooperative agents. 

3. In simple domains, independent agents have superior ability to complete tasks than 
cooperative agents. 

4. In complex domains, independent agents have inferior ability to complete tasks than 
cooperative agents. 

3.2 Promising Research Areas 

We believe the most essential properties of agent architecture and the most urgent rese- 
arch we envision are the following: 
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Contextual autonomy Autonomy of an agent’s choice over a set of tasks is often depen- 
dent on the context of all constraints such as resources and other agents. The concept 
of autonomy needs to be defined and its impact on other system qualities needs to be 
explored. 

Model-integrated adaptation An agent’s ability to modify its original model of the world 
is important. 

Awareness of other agents An agent needs to consider other agents in its decision making 
and form explicit attitudes such as cooperation or competition. We should formalise levels 
of considering other agents. 

Communication and shared knowledge An agent needs to effectively communicate with 
others while maintaining an economy of communication content and maintaining goal- 
directedness. Often, shared knowledge is useful in parsimonious communication, which 
needs to be considered. 

Self-evaluation As we discussed earlier, agents need to assess various parameters that 
determine their need to change features that impact system qualities. 



4 Response by Marcus Huber 

For the panel on "Evaluating Agent Architectures", we were all asked to answer a number 
of questions, and I will do so in a straightforward fashion here. I have been a principle 
designer and implementor of two pragmatic agent architectures, JAM and UMPRS, and 
routinely review other agent architectures for their strengths and weaknesses. 

What are the essential properties for dynamic domains? In order for an agent to cope 
with dynamic domains, an agent architecture must have some component that is limited 
in the time that it takes before taking action in the world. This "reactive" component 
is common to most agent frameworks, although it manifests itself in many forms. For 
example, CIRCA provides hard real-time reasoning capabilities to guarantee behaviour 
in dynamic domains. Multi-tier architectures such as 3T, interrap, and Atlantis all 
have one component of their architectures dedicated to acting and responding quickly 
according to the changing environment and the planning of higher-level cognitive layers. 
UMPRS and JAM both employ a reactive component that is activated between execution 
of plan steps during normal execution. 

An agent architecture also needs some mechanism for performing simultaneous or 
interleaved planning and execution. Without such a capability, an agent will either get 
bogged down with planning and not be able to respond to events in a timely manner, or 
lose coherent goal-directed behaviour by becoming too myopically focused on only res- 
ponding to the dynamics of the domain. Here too, there are a couple commonly encounte- 
red schemes and multiple architectures that employ them. Two common approaches are 
the multi-tier architectures such as that introduced above, where a deliberation/planning 
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level coordinates with a reaction/execution layer, and interleaved planning and execu- 
tion schemes such as JAM and UMPRS, which switches between plan deliberation and 
reactive behaviour according to the requirements of the situation. 

A third crucial capability for agent architectures within dynamic domains is some 
means to adapt their behaviour over time. Without the ability to adapt to a dynamic, 
changing domain, an agent will slowly become less and less effective at performing 
its duties. Adaptation and learning capabilities are much less frequently found in agent 
architectures and, when found, typically is restricted to only narrow aspects of the agent. 
One notable exception is the Soar agent architecture, which has learning as an integral 
part of the agent. 

Social modelling is the final critical property for agents within dynamic domains, 
albeit dynamic multiagent domains. Ostensibly, agents within a multiagent domain will 
want or need to coordinate with the other agents. Doing this in a timely manner requires 
an agent to perform quick, internal coordination reasoning (e.g., plan recognition) rather 
than use potentially slower communication-based schemes. Even when communication 
is pragmatic, internal models will facilitate timely coordination reasoning. 



What rules of thumb would you suggest for selecting an agent architecture? In my expe- 
rience, agent-based applications are almost always constructed with some pre-existing, 
"legacy" software or systems to which the agents wrap or interface. They are rarely crea- 
ted from scratch, without any software or institutional constraints. Because of this, the 
single most important factor that I have found for picking between agent architectures 
is simply the computer language in which they are implemented. While it is possible 
to interface between processes in disparate languages using such things as sockets and 
language-level interfaces such as JNI, as a rule of thumb, the simplest approach is usually 
just to stick with the language that predominates the overall application. 

The next most important rule of thumb is to determine the feature set required of 
the agent architecture by the application and match against the feature set provided by 
the architectures under consideration. Such features may include any of a number of 
criteria, such as autonomy, persistence, adaptability, and mobility. All agent architectu- 
res provide certain optimisations with respect to representation and reasoning and no 
agent architecture will be a perfect match. As a rule of thumb, finding the closest mat- 
ching architecture will reduce the amount of workarounds necessary during application 
development to compensate for capability mismatches. 

Another rule of thumb is to look at the agent architecture’s programming level and 
compare that to the optimal scheme for the application and domain. That is, look at 
whether the agent is programmed at a native-code API level (i.e., C++ or Java) or using 
a "knowledge-level" architecture-specific language. Some architectures such as Aglets 
are entirely written at a source-code level (Java), while others (e.g.. Soar, JAM) support 
both levels of programming. This becomes an important issue when the agent- specific 
language and conceptualisation is very different from standard programming paradigms 
(which it usually is) and is unsuited to the domain or may entail a learning curve on 
the part of agent programmers. A similar issue is that of the number of programming 
languages required, as multi-tier agent architectures typically require a different language 
for each tier in the architecture. 
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Does the architecture matter? The agent architecture selected for an application is an 
extremely important decision. Each agent architecture has some form of specialisation 
and optimisation with respect to representations and algorithms. Sometimes these spe- 
cialisations are for academic and theoretic reasons, sometimes they are for pragmatic, 
industry-oriented reasons. The specialisations (e.g., multiagent plan representations) 
may result in a superior product or facilitate development for some developers but may 
weaken a product or hinder development for others. Regardless, each architecture tends 
to be quite different from other architectures in at least one significant dimension. For 
that reason, people selecting an architecture must be careful to correctly identify the 
needs of the planned application and match them with the architecture’s strengths and 
avoids the architectures weaknesses. 

How can an architecture ’s effectiveness be measured? Measuring the effectiveness of an 
architecture is a pretty tricky matter. Ostensibly, the idea is to compare one architecture 
with another, or perhaps compare an agent-based approach to an approach not using 
agents. One measure of an architecture’s effectiveness is the amount of time or effort 
required to develop an application. As mentioned above, the various aspects of an agent 
architecture may facilitate or hinder development and an architecture that’s difficult to 
program will tend to be considered less effective. An issue related to development time 
is the debugging support provided by an architecture. Agent architectures, in particular 
those using knowledge-level programming, are more difficult to debug than API-only 
architectures due to the extra layer of interpretation for the knowledge-level components. 

Another measure of effectiveness is domain performance. Once the agent-based 
application is completed to a point that it is functional in the application domain, the 
application can be tested within that domain to see how well it performs. In many cases, 
the performance can be tested using industry standard tests or benchmarks. Failing 
the existence of objective benchmarks, more subjective measures, such as customer 
satisfaction, might be determinable. 

How can we compare two agent architectures? Exactly what the term "agent" is and 
what the critical dimensions of agency are will probably never be agreed upon. In the 
same fashion, any list of dimensions to measure and compare agent architecture is likely 
to be incomplete or unfocused with respect to some particular framework or domain. No- 
netheless, below is a list of features and capabilities (many of them difficult to precisely 
define in their own right), that might be used to characterise an agent architecture and 
used to compare two agent architectures. This list is a highly simplified and abbreviated 
version that I use when performing an evaluation or comparison. 

Why are there so many agent architectures? The primary reason there are so many dif- 
ferent architectures is that each architecture is specialised or optimised for some reason, 
be it ideology or pragmatics. Many architectures are the result of the particular rese- 
arch agenda of some research institution. Other architectures originate from industrial 
sources and tend to be focused on a particular application domain or industry paradigm. 

Related to this, to some extent, is "NIH" syndrome, or "Not Invented Here" syndrome. 
NIH syndrome arises when companies and institutions feel that they must create an 
architecture so that they are intimately familiar with all of its details and have full 
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Autonomy 


Temporal reasoning 


Adaptive 


Representation expressiveness 


Anticipation 


Capabilities 


Planning 


Reactivity 


Mobility 


Persistence 


Uncertain reasoning 


Dynamically modifiable 


Social ability 


Rationality 


Major components 


Believability (e.g., personality) 


Major reasoning phases 


Robustness 


Theoretical basis 





ownership and control of the architecture. Using an architecture provided by an external 
source necessitates accepting the design decisions made by the architecture developer 
and a potentially long learning curve before a comfortable level of familiarity with 
the architecture is achieved. Furthermore, not all agent architectures are provided with 
source code and therefore cannot be modified, which is often a significant limitation for 
many. 

Another reason for the proliferation of agent architectures is the shift of favour from 
one language du jour to the next language du jour. For example, PRS-based architectures 
were implemented first in Lisp, then in C++, and most recently in Java. New architec- 
tures, or new versions of old architectures, will be developed when the next popular 
programming language rears its head. 

Should we have or impose standards? I do not think that the field of intelligent agents is 
in any position for standards to be imposed. It is difficult to reach any form of consensus 
on the definition of the term "agent", let alone impose standards on agent architectures. 
FIPA (Foundation for Physical Intelligent Agents) has been involved in creating standards 
for multiagent communities, but it turns out that this has little impact upon an individual 
agent’s internal structure (i.e., architecture). If anything, some effort should be made 
to establish criteria that must be satisfied before the term "agent" can be applied to an 
architecture or application, as the word has been too widely used and will inevitably 
result in a backlash on genuine agent researchers and application developers. 

What are the most urgent or interesting research areas with respect to agent architectu- 
res ? I believe that all agent architectures must employ some form of learning capabilities 
so that they can adapt to new and changing situations in dynamic environments or op- 
timise in more stable environments. Without the ability to adapt, agents will become 
increasingly more brittle and error-prone. 

Another significant research area is that of computation models of personality and 
character. This will be important relatively quickly within the entertainment industry 
in such things as computer games and movies. The relatively short term will also see 
an increase in applications related to job training in jobs requiring interpersonal skills 
and achieved using simulated personalities. In the longer term, realistic personality and 
character models and algorithms will be increasingly important as computers become 
ubiquitous and more expectations are made upon software to be intelligent and somewhat 
personable. 
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A widely recognised important research area is uncertain reasoning. Agents will 
almost always be operating in domains where they are operating without complete in- 
formation or where the domain is simply too complex to deal with using exhaustive 
techniques (primarily where Al/agent techniques are supposed to provide benefits). A 
great deal of research in probabilistic, possibilistic and other paradigms for reasoning 
with uncertainty has been performed, but the field is certainly not solved and any progress 
made in this field can be used to advantage by agent architectures. 

5 Response by John Pollock 

In my paper for the conference, I distinguished between anthropomorphic agents and 
goal-oriented agents. Anthropomorphic agents are those that can help human beings 
rather directly in their intellectual endeavours. These endeavours consist of decision 
making and data processing. An agent that can help humans in these enterprises must 
make decisions and draw conclusions that are rational by human standards of rationality. 
Anthropomorphic agents can be contrasted with goal-oriented agents, which are those 
that can carry out certain narrowly-defined tasks in the world. For goal-oriented agents, 
the objective is to get the job done, and it makes little difference how the agent achieves 
its design goal. In my paper, I focussed on anthropomorphic agents, and argued that a 
successful anthropomorphic agent must mimic human cognition fairly closely. 

On the other hand, it might be possible to define a metric of success for goal-oriented 
agents and then evaluate an architecture in terms of the expected- value of that metric. 
This raises the possibility that goal-oriented agents could be deemed successful without 
looking much like human beings. Flowever, it seems likely that the definition of such a 
metric will not be possible for goal-oriented agents operating in complex environments 
and designed to achieve goals that are hard to achieve in those environments. Even if it 
is possible, it is unlikely to give us much guidance in agent design. It is not as if there are 
continuous parameters that we can adjust to maximise the expected value. Instead, there 
are qualitatively dissimilar architectures to be compared. Given two architectures and 
a characterisation of the environment, we might be able to argue that one has a higher 
expected value of success, but we must propound the architectures independently of the 
success metric. 

I suspect that for many kinds of goal-oriented agents and environments, a human-like 
architecture is essential for success. First, note that the agent must share the following 
human capabilities: 

- The agent must be able to acquire information perceptually. 

- The agent must be able to learn general characteristics of its environment on the 
basis of individual instances of them. 

- The agent must be able to acquire information at one time and use it at a subsequent 
time (temporal projection). 

- The agent must be able to predict the consequences of actions and events. 

- The agent must be able to make and execute plans. 

I argued in my paper that all of these capabilities require defeasible cognition. The agent 
must be able to form beliefs on the basis of information currently available to it, and 
correct those beliefs later when new information makes that desirable. 
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Such cognition need not have the serial structure we associate with human reasoning. 
That is slow. Humans often employ special-purpose Q&I modules (quick and inflexible 
modules) that achieve speed in part by making assumptions that are usually but not 
always true, and by employing various approximation techniques. Examples are catching 
a baseball or computing probabilities. For example, one could catch a baseball by literally 
reasoning about its trajectory, but that would be impracticably slow. The baseball would 
have landed before we figured out where to go to catch it. Instead, humans and most 
higher animals have special-purpose cognitive modules that compute trajectories rapidly, 
but work correctly only under certain circumstances (e.g., the ball is not going to bounce 
off of anything). It is arguable that most human probabilistic reasoning also proceeds in 
terms of Q&I modules rather than explicit reasoning. The outputs of Q&I modules are 
fallible, so it is important that they can be overridden by explicit reasoning. The advantage 
of explicit reasoning is that it can take all the available information into account, and it 
can deal with cases for which there are no applicable Q&I modules. But it is slow. 

Human beings do not really do a lot of reasoning. They are better modelled as bundles 
of Q&I modules with reasoning sitting on top and monitoring the output, correcting it 
where necessary and filling in the gaps when there are no applicable Q&I modules. This 
produces a real-time agent with the quick response time required for getting around in 
the world, while still allowing the agent to engage in sophisticated cognition when that 
is advantageous. This would seem to be a good architecture for artificial agents as well. 

The sense in which Q&I modules are a kind of defeasible cognition is that they can 
be overridden by explicit reasoning. Reasoning, on the other hand, must be internally 
defeasible. As the court of last appeal, reasoning must have the ability to correct itself. 
So a sophisticated agent with the above capabilities must engage in defeasible reasoning. 

A familiar reason to attend to the structure of human rational cognition is that it 
represents one solution to the design problem of building an autonomous agent capable 
of functioning in a complex environment. It is thus the source of insight for our own 
designs. But I am now arguing that at least the broad outlines of human cognition may 
constitute the only solution to the design problem. This, of course, is only true for 
sufficiently complex environments and agents that must exhibit human-like capabilities 
to achieve their goals. Furthermore, the details of an agent design need not be human-like. 
For example, the psychological literature strongly supports the conclusion that modus 
tollens is not a primitive rule of human reasoning. We can learn to use modus tollens 
in our reasoning, but it is not built into the human cognitive architecture. However, an 
artificial agent would not be flawed by virtue of employing it as a primitive rule. 

It must be acknowledged that the agents we are capable of building today are not 
sufficiently sophisticated to need an architecture of this degree of sophistication. It was 
suggested in the panel discussion that I am talking about architectures for twenty years 
from now, but I suspect that within five years we will be trying to build agents requiring 
this kind of architecture. Even now, an agent like the Mars Lander would profit from 
such an architecture if we were able to supply it. 
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6 Response by Donald Steiner 

In the following answers to the panel questions, I consider agent architecture to be the 
conceptual and internal architecture of a single agent. 

What do you consider the most essential properties and building blocks of agent architec- 
tures to support autonomous action in dynamic, uncertain, or real-time environments? 
The most essential property of an agent architecture is to allow flexible integration of 
components supporting interaction with the environment (including other agents, "non- 
agent" software systems, physical sensors and actuators, and humans) and management 
of this interaction (including reasoning, learning, coordination, etc.). 

From your experience, are there any rules to help designers of agent systems to decide 
which type of architecture to choose for a certain application domain (e.g., reactive, 
hybrid, layered, component-based)? A reasonable analysis of the specific problem and 
requirements at hand taking the following into consideration: 

- Degree of flexibility 

- desired granularity 

- required agents and their tasks 

- available computing power 

- time required to implement. 

Are certain architectures particularly suitable / unsuitable for certain application do- 
mains ? Many application domains have requirements that not all architectures can fulfil. 
For example, a pure BDI architecture may be too complex to use in federated database 
systems; even a hybrid system may be too large and slow for use in micro-controllers 
for robots. 

Does the agent architecture really matter, at all? As per the above comments, yes. 

Flow can the effectiveness of agent architectures be evaluated? For a specific application, 
it is possible to evaluate architectures with respect to the following criteria: 

- Ease of design and implementation 

- efficiency of execution 

- ease of maintenance and extensibility. 

However, this is not sufficient for comparison. 

Flow can we compare two agent architectures? Specific features provided by the ar- 
chitectures can be listed, conceptual comparisons of how architectures may be used in 
particular problem domains may be given. A concrete comparison of effectiveness in 
implementation and execution is harder, as specific implementations rely not only on 
the conceptual architecture, but also on the underlying platform and tools. Two different 
implementations of the same architecture may produce different results. Thus, it is more 
appropriate to compare specific agent development platforms. 
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What could benchmarks for agent architectures look like? A benchmarking for archi- 
tectures is probably, according to the above, not applicable. Certainly, a list of possible 
features of an agent architecture can be established (similar to the list described by 
Marcus Huber above). It is unclear as to how relevant such a comparison would be, 
however. 

Why are there so many agent architectures? This is primarily because of a lack of 
readily available reliable agent development toolkits. The developers of agent systems 
have thus had to build their agents from scratch. Rather than choosing to adopt other 
architectures (as conceptually but not necessarily specifically described in papers), they 
have implemented their own. 

Do you think there is a need / possibility for standardisation efforts at this level? There 
is a need for standardisation in three areas: 

1 . Agent Communication: The high-level language and protocols different agents use 
to interact with each other, (e.g., KQML and FIPA) 

2. Agent Management: The generic life cycle of an agent and the processes by which 
agents find each other and exchange messages. (FIPA) 

3. Agent APIs: The mechanisms for incorporating components into agents (e.g. lear- 
ning, planning). 

There is little need for standardising a specific agent-internal architecture. 
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Abstract. We propose a methodology that can be used to compare and evaluate 
Artificial Intelligence architectures and is motivated by fundamental properties 
required by general intelligent systems. We examine an initial application of this 
method used to compare Soar and CLIPS in two simple domains. Results gathe- 
red from our tests indicate both qualitative and quantitative differences in these 
architectures and are used to explore how aspects of the architectures may affect 
the agent design process and the performance of agents implemented within each 
architecture. 



1 Introduction 

Development of autonomous intelligent systems has been a primary goal of Artificial 
Intelligence. A number of symbolic architectures have been developed to support the 
low level, domain independent functionality that is commonly required in such systems. 
Although some studies have examined what types of agent behaviors are most appropriate 
within a given domain (e.g., Pollack and Ringuette [12]), only secondary attention has 
been paid to the complementary problem of determining which computational primitives 
should be supported by the architecture and how they should be implemented. 

Researchers attempting to develop new agents are faced with a difficult problem: 
should they construct their own symbolic architecture for the tasks at hand, or should 
they reuse an existing architecture? Because symbolic architectures tend to be large 
and complex, development requires a significant amount of time and effort. This is 
especially true if the resulting architecture is intended to be flexible enough for reuse 
in diverse situations. On the other hand, because of the lack of research examining how 
different architectures support a particular type of behavior, the selection process is 
often haphazard, which as Stylianou et al. [15] state, can lead to inefficiencies in the 
development and the resulting agent. 

Our goal is to develop a methodology for evaluating symbolic AI architectures and to 
examine how differences in these architectures affect both the agent design process and 
the capabilities of the resulting agent. This information will allow developers to make 
more informed decisions about the suitability of an architecture to a specific domain, 
and will enable future architectures to fill specific needs more effectively. Moreover, 
the design of any agent-based system will profit from information that indicates which 
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computational primitives are most effective for a particular class of problems. This paper 
describes a methodology for conducting such evaluations and our initial experience 
applying the methodology to Soar (Laird [7]) and CLIPS [1]. 



2 Architectures, Intelligence, and Knowledge 

The class of AI symbolic architectures we are interested in are those that support the 
development of general, intelligent knowledge-rich agents. Following Newell’s descrip- 
tion [10], an architecture is the fixed set of memories and processing units that realize 
a symbol processing system. A symbol system supports the acquisition, representation, 
storage, and manipulation of symbolic structures. An architecture is analogous to the 
hardware of a standard computer, while the symbols (which encode knowledge) corre- 
spond to software. The role of a general symbolic architecture is to support the encoding 
and use of diverse types of knowledge that are applicable to various goals and actions. 
Some examples of these architectures include: Atlantis (Gat [4]), CLIPS [1], 0-Plan2 
(Tate [16]), PRS (Georgeff [6]), Soar [7], 

The basic functions performed by an architecture usually consist of the following 
(from Newell [10] p. 83): 

- The fetch-execute cycle 

- Assemble the operator and operands 

- Apply the operator to the operands 
using architectural primitives 

- Store results for later use 

- Support access structures 

- Input and output 

Architectures are distinguished by their implementation of these functions, and the 
specific set of primitive operations supported. For example, many architectures choose 
the next operator and operand by organizing their knowledge as sequences of operators 
and operands, incrementing a program counter to select the next operator. They also have 
additional control constructs such as conditionals and loops, but depend on the designer 
to organize the knowledge so that it is executed in the correct order. Other architectures, 
such as rule-based systems, examine small units of knowledge in parallel, selecting an 
operator and operands based on properties of the current situation. Architectures are 
often further distinguished by the inclusion of additional functions, such as interruption 
mechanisms, error-handling methods, goal mechanism, etc. 



3 Evaluation Methodologies 

As analysts, we are faced with a difficult issue, how should we evaluate architectures and 
their corresponding functions? A first approach would be to try to compare architectures 
based on which types of knowledge they can encode, or what functions they directly 
support, possibly finding tasks that one architecture can be used for, but another cannot. 
Evaluations that follow this approach often focus on high level categorical differences, 
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which are usually motivated by examining the demands of a certain class of application 
domains. In this type of evaluation, differences such as support for forward and back- 
ward chaining, the availability of a GUI, or the ability to maintain real-time commitments 
may be assessed (e.g., Gevarter [5], Lee [8], Mettrey [9], Stylianou [15]). Because most 
architectures are Turing complete, they can often achieve identical functionality by in- 
terpretation with the addition of knowledge. For example, hierarchical planning systems 
are likely to have special architectural constructs to support the semantics of goals and 
actions. Simple production systems, like OPS5, however, can be designed to implement 
the same behavioral strategies as hierarchical planners with the addition of productions 
that encode knowledge about the missing constructs. However interpretation has two 
potential costs: an increase in execution time, and an increase in knowledge required 
to specify the new behavior. On the other hand, adding functionality to the architecture 
itself is also likely to introduce complementary performance overheads. For example, an 
agent that is built upon a specialized hierarchical planning architecture may not be able 
to excise the superfluous components. As a result, it may end up paying a performance 
penalty in simple domains where hierarchical planning is not exploited. It is an empi- 
rical question as to which cost dominates, and the costs might be different for different 
mixtures of knowledge, goals, and possible actions. Therefore, our methodology will 
include measuring both the required knowledge and required execution time of different 
agents in multiple domains. 

A second possible approach to architecture evaluation is to create a set of benchmark 
tasks and use independent teams of expert programmers to implement the task on each 
architecture. Evaluation occurs by directly comparing these implementations to one 
another (Schreiber [14]). Unfortunately, this approach can lead to a confounding between 
the contribution of the architectures and the knowledge that was encoded in them. For 
example, if we have two teams of robot soccer players and one soundly beats the other, 
is it because the one that won had a better architecture, or is it because the programmers 
encoded better programs? Without the ability to examine how programs are implemented 
at a fine grained level, it is difficult or impossible to determine. Gaining this ability, 
however, is typically at odds with using complex, real-world problems as benchmarks. 

Plant and Salinas [11] attempted to avoid the problem of confounding the contribution 
of knowledge and architecture by providing an extremely controlled design specifica- 
tion for their benchmark that reduced the differences between implementations almost 
completely to minor syntactic variations. Although this helps ensure that knowledge re- 
mains consistent between implementations, it suffers from its own drawback. Adhering 
to this approach is likely to result in examining only those implementation methods 
that are commonly available. This means that architecturally specific properties, such as 
control mechanisms, may not be fully explored and the architecture as a whole may be 
misrepresented by the test’s results. 

To avoid these problems, we propose a methodology where knowledge is encoded 
in two architectures so that the resulting systems produce the same behavior. As part of 
this methodology, it is critical that the agents are exposed to identical situations and thus 
receive identical input from their sensors. In many cases, especially simple ones, this is 
straight forward. In cases where the agent’s input or output contains stochastic elements, 
however, the processes that govern this behavior must be replaced with computationally 
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equivalent deterministic processes to allow the agent’s exact behavior to be reproduced 
on each iteration of the experiment. This modification is necessary because an agent’s 
performance may vary drastically between iterations if nondeterministic elements are 
used. 

Thus, we do not intend to measure which architecture is able to generate the best 
solution, but instead measure which architecture uses fewer resources in terms of the 
amount of knowledge required to achieve a specified behavior and the amount of time 
required to generate that behavior. Minimizing knowledge is advantageous because it 
gives a first order approximation as to how much effort is required to encode (or learn) 
knowledge for a task, which is directly related to how well the architecture supports 
the knowledge level. By measuring the execution time, we are measuring how fast the 
architecture can employ knowledge, which in turn gives a measure of interpretation (if 
it is required) or the overhead from other architectural features. A third measure that 
would also be useful would be memory space; however, we have concentrated on the 
first two in our initial investigations. 

Application of these metrics will yield information on how the architecture as a whole 
supports different behaviors. This is done by first decomposing an abstract architecture 
into a set of capabilities that can be used to develop agents for different domains. Be- 
cause each architecture provides different computational primitives, the mixture of native 
constructs and interpreted knowledge required to implement each capability will vary. 
An architecture’s performance profile indicates how well each capability is supported 
through interpretation of knowledge and the use of architecturally native mechanisms. 
Thus, although divergent architecture may be best suited to implement a subset of the 
possible capabilities, and may be unable to support others, our methodology could be 
applied to any universal architecture in theory. Moreover, it allows comparisons between 
very different architectures through examination of their performance profiles. 

From the agent’s perspective, selection and application of operators and operands 
constitute decisions that must be made and acted upon. We identified five methods that 
can be used to implement the decision making process. These methods differ in terms of 
what types of knowledge they use and how their knowledge is organized. In general, two 
types of knowledge are used in decision making: applicability knowledge, and control 
knowledge. The first of these determines when an action can be applied to a particular 
state (e.g., by specifying a minimal set of conditions that must be met for the action to be 
feasible). The second type of knowledge determines when action should be applied to a 
particular state (e.g., by specifying a maximally specific set of conditions that must be 
met for the action to be pursued). The five methods described below are linearly ordered 
such that control knowledge is increasingly segregated from applicability knowledge. 
Agents that do not attempt to differentiate control and applicability knowledge, do not 
benefit as much from knowledge generalization. In the extreme, such agents will have 
difficulty acquiring new rules (e.g., through learning) because of the specificity required 
to properly encode their conditions. Moreover, large systems designed in this manner 
may not function because computational resources are exceeded. 

The methods below are not meant to be completely comprehensive, rather they are 
part of the total set of decision making methods that in the limit include selections based 
on analogy, utility, planning, etc. 
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1 . Mutually Exclusive Actions: Both applicability and control knowledge are merged 
so that actions are considered only if they should be selected. To prevent race con- 
ditions, the programmer must ensure that the conditions of each action are mutually 
exclusive. 

2. Segregation of Control Knowledge: Control knowledge is somewhat segregated 
from applicability knowledge, but the control knowledge is restricted to situation 
independent selection heuristics. Thus, situation specific control knowledge is not 
possible. 

3. Two Phase Decision Process: Two distinct phases occur in the decision process. 
During the first, all possible actions are proposed, and during the second phase 
one of these actions is selected and then applied. This allows the preconditions for 
considering actions to be less specific than in either of the previous cases, and allows 
situation-dependent control knowledge to be used for selecting the best action. 

4. Three Phase Decision Process: Actions are hist proposed, then evaluated and fi- 
nally selected. Three phases allows knowledge to be segregated further. This creates 
four general classes of knowledge that each rule may encode: proposal, preference, 
selection and application. 

5. Goal Constrained Selection: The agent uses high level goals to pursue problem 
solving and to control search. This includes standard means-ends analysis where an 
action may be selected even when it can not immediately apply, and its selection 
can be used to constrain the selection of subsequent actions. 

Although our ultimate goal is to apply this methodology to a wide variety of archi- 
tectures, we need to start somewhere. Given the potential pitfalls inherent to empirical 
evaluation, it behooves us to start with architectures that are not radically different. Gi- 
ven our experience with Soar, it made sense to pick another rule-based system. CLIPS 
was an obvious choice given its wide use and free availability. The following section 
describes the Soar and CLIPS architectures and provides a motivation for examining 
particular components of these systems. Our implementation of this methodology with 
respect to Soar and CLIPS is discussed in Section 5. 



4 Architectural Description 

Soar and CLIPS are similar in many respects; they are both forward chaining produc- 
tions systems based on the Rete matching algorithm (Doorenbos [2], Forgy [3]) and 
implemented in the C Programming Language. Our evaluation, however, focused on 
four critical architectural properties in which overt distinctions could be observed: kno- 
wledge representation, knowledge access, knowledge deployment, and decision making. 
Differences in knowledge representation, access and deployment impact the expressi- 
veness of the design language, and this contributes to the relative ease or difficulty of 
producing certain behaviors. Similarly, a less general decision making process may re- 
sult in less robust behavior. The following two sections describe the CLIPS and Soar 
architectures with respect to these issues. 
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4.1 CLIPS 

In CLIPS, short term declarative knowledge is stored as a list of facts 1 . A fact consists of 
a name and a series of slots that can either be ordered or unordered (Figure 1). Facts with 
ordered slots can be thought of as lists in which the first element is the name. A series of 
values follows the name, and the semantics of a particular entry is determined implicitly 
by its position. Alternatively, unordered facts can be used to explicitly define a fact’s 
attributes and values. An unordered fact has a series of named slots, each of which can be 
set to accept either a single value or multiple values. Neither of these options, however, 
provide a good mechanism for encoding data objects that are incompletely specified. In 
this case, slots that are unused must be filled with place holders, and system resources 
must still be devoted to their representation. 



Short Term Memory 


A Matching Rule 


(month January) 

(day Saturday) 

(task (name get-car) 
(priority 2)) 


(defrule when- to -get -car 
(month January) 

(task (name get-car) ) 

=> 

(assert 

(do (what get-car) 
(when now) ) ) ) 



Fig. 1 . Knowledge Representation in CLIPS 



Long term knowledge is stored as rules that contain conditions on the left hand side, 
and consequences on the right hand side (Figure 1). Rules become activated when all of 
their conditions are matched by the current contents of short term memory (facts). CLIPS 
syntax allows a wide variety of conditions to be expressed. For example, fact values can 
be variablized and required to satisfy arbitrary predicates, and both conjunction and 
disjunction of conditions is accepted. The right hand side of rules can be used to change 
to contents of sort term memory or to invoke procedural knowledge through a variety of 
mechanisms. 

CLIPS deploys knowledge by firing productions serially. During execution, matching 
rules are placed on the agenda, and the rule on the top of the agenda is fired. This results 
in a recalculation of matching rules, and thus a possible modification to the agenda. 
Two factors contribute to a matching rule’s position in the agenda. The first of these is 
salience, which defines a rule-level preference. The salience value may be computed 
before or during execution; rules with higher values are placed closer to the top of the 
agenda. The second mechanism is called the search-strategy. This defines how rules of 
equal salience are placed relative to each other. CLIPS offers many possibilities for this 
setting including depth-first, breadth-first and random. 

CLIPS does not provide architectural mechanisms specifically to support all of the 
decision making processes listed earlier. As a result, the more complex schemes must 
be implemented through the use of additional knowledge. 

1 CLIPS also allows short term knowledge to be stored as Objects, via use of the CLIPS Object 
Oriented Language, however this was not examined in our experiment. 
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4.2 Soar 

Soar’s short term memory is a graph that is organized as a set of states (Figure 2). Each 
state has an arbitrary number of slots and corresponding values. Slots can be named with 
any alphanumeric string, and their values can be either a symbol or a link to another 
slot. The states are hierarchically organized and linked together with the architecturally 
created superstate slot that points to the parent state. 



Short Term Memory 




A Matching Rule 

sp { when-to-get-car 
(state <s> 

"'month January 
"'day Saturday 
A task <t>) 

(<t> 'name get-car) 

(<s> A do <d>) 

(<d> 'what get-car 
'when now) } 



Fig. 2. Knowledge Representation in Soar 



Like CLIPS, Soar stores long term knowledge as rules whose left hand sides contain 
a series of conditions that match against the current contents of short term memory. 
Although Soar supports standard numeric predicates such as <, >, = etc as well some 
for determining set membership, it does not allow user defined predicates or functions 
to be used in the left hand side of a rule. Such matching requires at least two distinct 
rules, and rule firings. Soar does, however, allow a state’s attribute name, as well as their 
value to be bound to a variable — a property that can reduce rule complexity in some 
situations. Similar to CLIPS, the right hand side of a Soar rule can be used to modify 
the contents of short term memory, and to invoke external procedural knowledge. Some 
rules, however, are used to take advantage of Soar’s built in decision making scheme 
by proposing architectural constructs called operators, or by determining the relative 
suitability of a set of operators. 

Unlike most production systems. Soar deploys its long term knowledge by firing all 
activated rules in parallel. Once a matching set of rules is calculated, all rules are fired and 
their consequences are carried out. This sequence, which is repeated until exhaustion, 
is called the elaboration phase. Parallel rule firing offers two potential benefits to serial 
rule firing. Foremost, is that all relevant knowledge is brought to bare in each situation, 
allowing Soar to consider all relevant paths before making a commitment. Secondly, 
parallel rule firing obviates the need for a rule level conflict resolution mechanism such 
as CLIPS ’s search strategy. This means that Soar forces control knowledge to be explicitly 
represented in the rules and helps free the developer from worrying about race conditions. 

Soar supports decision making with the operator construct and an architecturally 
supported decision cycle. Operators represent high level actions or goals and are selec- 
ted and applied in serial order. This is supported by Soar’s decision cycle, which occurs 
in a series of phases. After Soar completes the elaboration phase, it examines which, 
if any, operators have been proposed, and computes their relative preferences. Finally, 
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the best operator is selected and the cycle ends. During subsequent cycles, the primitive 
actions of the selected operator will be carried out and new operators will be proposed. 
At certain times. Soar will be unable to choose an appropriate operator, or will not know 
immediately what actions are required to achieve the goals of the selected operator. 
In such instances, the architecture will form a substate in which more primitive reaso- 
ning can occur. This hierarchical nature facilitates problem solving at multiple levels of 
abstraction. 



4.3 Summary of Architectural Differences 

We have outlined four areas (knowledge representation, knowledge access, knowledge 
deployment, and decision making) in which the Soar and CLIPS architectures are signi- 
ficantly different. Knowledge representation in CLIPS is restricted to list like structures, 
whereas Soar supports more general graph structures. Knowledge is accessed by con- 
ditions on the left hand sides of rules in both systems. CLIPS provides the ability to 
use arbitrary function or predicate calls as a part of the matching process whereas this 
requires two distinct rules in Soar. On the other hand, Soar allows slot names, as well 
as slot values, to be variablized and matched. Replicating this mechanism in CLIPS is 
difficult. Knowledge is deployed via rule firings in both systems, but Soar fires all mat- 
ching rules in parallel whereas CLIPS fires rules serially. Finally, Soar supports decision 
making with a three phase process in which all relevant knowledge is examined and 
possible actions are proposed, the relative preferences are then examined and an action 
is selected. Because CLIPS has no comparable mechanism, this type of strategy must 
be implemented with rules. 



5 Comparing the Architectures 

In this section, we outline how our methodology was applied to these two systems and 
provide an analysis of the reliability of our timing results. 

Remember, the goal is not to determine which is the one, best architecture. Our goal 
is to develop a methodology that can be used to answer questions such as: for this class 
of knowledge and goals, what are the costs of using each architecture. As we present the 
results, it will become apparent that we have only scratched the surface in our comparison 
of Soar and CLIPS, and that many of the possible “strengths” of Soar have yet to be 
explored. 



5.1 Implementation 

We implemented the methodology described in Section 3 by examining pairs of agents 
(each implemented in a different architecture) that utilized the decision making strategies 
previously described. Recall that our methodology specified that the agents encode the 
same knowledge and utilize the same strategies, are exposed to the same stimuli, and 
exhibit the same behavior. 
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5.2 Task Environments 

Tasks were created to test the architectures in two simple domains. Limiting our initial 
investigations to simple environments, and thus simple agents, allowed us to examine 
programs closely and thus avoid programmer bias. The first environment we utilized was 
the classic Towers of Hanoi problem, the second was an interactive environment called 
Eaters, which resembles the arcade game PacMan. Both environments are deterministic 
and discrete, as per Russell and Norvig’s [13] definition, but the environment state in the 
Towers of Hanoi problem is completely represented within the agent’s memory, whereas 
the sensors of the Eaters agent only provide limited information about the environment. 

The Towers of Hanoi consists of three pegs, and a number of disks, each with a 
distinct diameter. Originally the disks all begin on a certain peg, stacked from largest 
to smallest. The goal of the puzzle is to move the stack to a different peg. The only 
restrictions are that only one disk can be moved at a time, and a disk may never be 
placed on top of a smaller disk. 

The Eaters domain consists of one or more Eaters (agents) that move around a grid- 
based world eating food pellets. Each Eater only perceives locations within a small radius, 
yielding incomplete information about the world. The Eaters environment supports a 
generalized agent communication protocol that is used by both Soar and CLIPS to 
provide the agent’s input and output functions. 

5.3 Benchmarking 

A critical component of our analysis involves measuring the time required by each 
architecture to use the knowledge that its agents encode. To do this, an accurate timing 
mechanism is needed. Our timers are implemented with the getrusage system call 
which returns the CPU time used by a particular process. 

The times that we report in the following sections are referred to as kernel times. 
The kernel timer is turned off for the majority of the agent’s input and output functions. 
It is only turned on in these situations for the period in which assertions or retractions 
are made to the agent’s short term memory. Examining kernel time as opposed to total 
CPU time frees us from the possibility that differences in the optimization of a one 
architecture’s input/output functions have any influence on the results. 

Although the timers we implemented provide much higher reliability and accuracy 
than measurements of total CPU time, any timing mechanism is subject to errors that 
may skew the results. We identified a number of possible sources that may impact the 
timing results. These potential pitfalls and our methods of dealing with them are listed 
below: 

- False Time Reports: Ideally, toggling a timer on and then immediately off would 
result in no accumulated time since each of these actions would be atomic, and there 
would be no delay between execution of the instructions. In reality, the function calls 
to turn timers on and off are not atomic, and a significant amount of false time, Tf, can 
be accumulated simply by turning the timers on and off repeatedly. Normalization 
of all timing results was computed by counting the number of kernel timer cycles 
K c , and subtracting K c Tf from the reported kernel time. 
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- Terminal I/O: Output to the display (or to a file) has a significant impact on the 
kernel time values. All results that we report were conducted with the absence of 
such input/output, or with only a single output statement indicating the termination 
of the task. 

- User Interface: Differences in the implementation of the user interface were obser- 
ved to have an impact on timing results in both architectures. For the tasks exami- 
ned, changing from GUI to command line interface improved CLIPS performance 
by more than a factor of two, as opposed to only ~ 15% for Soar. As a result, the 
measurements we report are using the CLIPS command line interface, but retain 
Soar’s standard Tel interface. 

- Other Factors: It is possible that factors unknown to us may have impacted the 
results of our tests. In particular, little attention was paid to compile time options, 
and only slight modification to the standard Makefiles were used when building the 
executables. 2 

The following two sections present the experiments and results conducted in the 
Towers of Hanoi and Eaters domains. In all cases, the experiments were conducted on 
a Sun Ultra SPARC with 128M of RAM running Solaris 2.5.1. The times reported are 
kernel times and the we have attempted ensure their accuracy and reproducibility with 
the methods described above. 



6 Towers of Hanoi 

Within the Towers of Hanoi (ToH) domain, we implemented a set of five agents, each of 
which utilize a different set of decision making knowledge. Although internal differences 
among these classes of agents are significant, their external behavior is identical. They 
all implement the optimal strategy for solving the puzzle (that is, they require only the 
minimal number of moves). 

Figure 3 shows the performance results for both Soar and CLIPS agents over the range 
of problem solving strategies we examined. Qualitatively, the shape of the performance 
curve is well matched between the two systems, indicating that at least in simple domains, 
CLIPS is fast enough to emulate more robust decision mechanisms such as those natively 
implemented in Soar without a serious cost to performance. Two exceptional points on 
the curve occur at either ends of the decision mechanism spectrum. On the left hand side, 
Soar performs less well than CLIPS, and even less well compared to a Soar agent using 
more segregated control knowledge. This difference may be attributable to the fact that 
Soar agents that segregate control knowledge by using the operator construct are able to 
constrain rule matching to improve overall performance. On the right hand side of the 
plot, two Soar agents have been implemented. 

The first of these (1), outperforms half of all Soar agents tested, whereas the second 
agent (2) is more than a factor of two slower than its counterpart. Somewhat surprisingly, 

2 Two compile time options were modified in Soar’s default Makefile. The first prevents Soar 
from gathering unnecessary statistics about its memory usage, and the second results in the 
incorporation of only the timers that we implemented in CLIPS (kernel time and total CPU 
time). 
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Actions Control Constrained 



Fig. 3. Towers of Hanoi Performance (9 Disks) 



these agents differ only in how goals are represented. Agent 2 uses Soar’s built in subgoal 
mechanism, whereas agent 1 is directly paralleled to the CLIPS implementation and uses 
rules to maintain a stack of goals. This illustrates one of the potential trade-offs which 
arise when functionality is built into an architecture, but is not fully used. In this particular 
example, using Soar’s architectural subgoal stack is not beneficial because an extremely 
large number of goals are being produced. Each time a subgoal is created or destroyed, 
some amount of processing is done to support Soar’s learning mechanism, even in cases 
such as this, when learning is explicitly turned off. The other agent ( 1 ) avoids this cost by 
maintaining its own data structure to represent the goal stack. The significant difference 
between the performance of these two implementations has led to the development of a 
new version of Soar, which we call Soar-Lite. Soar-Lite is a streamlined version of Soar 
which sacrifices some costly, and less frequently used, features in order to achieve an 
overall performance gain. The main difference between the current version of Soar-Lite 
and Soar is the removal of Soar’s learning mechanism. The agent code for (2) was rerun 
in the Soar-Lite architecture and achieved more than a three— fold increase in speed. 
The performance difference is only substantial for the agent which uses architectural 
subgoaling, this data point is plotted in Figure 3 and labeled as 2'. For all other data 
points, the performance difference is less than .15 seconds. The results of this series of 
tests indicate that there is a similar performance trade off when using more complex 
decision mechanisms in both Soar and CLIPS. At the same time, they also indicate that 
very little penalty is incurred for Soar’s architecturally supported decision cycle and 
that the cost of subgoaling can be canceled by removing Soar’s learning mechanism. 
Moreover, as the complexity of the domain increases, we expect that architectural based 
support for these features will provide a higher degree of efficiency than a rule based 
counterpart. 
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Fig. 4. Knowledge Requirements of Decision Making Strategies 



Figure 4 shows the number of rules necessary to encode agents described above. 
For both architectures, a similar pattern can be recognized because the number of rules 
increases with the complexity of the decision mechanism. This is expected, because 
control knowledge is becoming increasingly distinct from applicability knowledge in 
each successive implementation. As the agents incorporate more and more knowledge, 
however, we would expect this result to diminish and perhaps reverse. This is predicted by 
the fact that smaller rules, which contain less control knowledge and are therefore more 
generalized, can be more easily reused in multiple situations. The two versions of the 
final Soar agent are also displayed in this plot. Using Soar’s built in subgoal mechanism 
lowers the number of required rules, and when used with Soar-Light, this implementation 
achieves very good performance relative to other decision making strategies. The Soar 
agent that maintains a goal stack (2) requires slightly more rules than its corresponding 
CLIPS implementation because of the need for an additional operator (that of building 
the goal stack) that is not required by CLIPS. Both goal stack agents, however, are more 
efficient (both in terms of time and rules) than their corresponding two-phase or three- 
phase counterparts. This result can be attributed to two facts. First, the Towers of Hanoi 
is a naturally recursive problem and the ability to commit to an action before it is possible 
to actually carry out that action results in a simpler solution to the puzzle. Second, the 
goal stack provides a light-weight mechanism to support this implementation and yet 
does not require devoting computational resources to unutilized features. 

Finally, because the complexity of the Towers of Hanoi problem has a simple, well 
known, closed form, we were able to examine how close the Soar and CLIPS implemen- 
tations to this ideal. This analysis allows us to examine how well the architectures scale 
as the required time to solve a problem increases. Figure 5 displays the scalability of 
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Soar and CLIPS agents that use mutually exclusive productions and segregated control 
knowledge to make decisions. Exponential curves were fit to this data, providing fits of 
0.0019-2.025” and 0.002 -1.941" for Soar agents, and 0.0005 -2.015” and 0.001 -2.058" 
for CLIPS agents respectively. All of these fits come very close to the ideal growth rate 
of 2” indicating a high potential for scalability as time to solution increases. 




0 2 4 6 

Disks 



10 12 



Fig. 5. Growth of Time Requirements in Towers of Hanoi 



7 Eaters 

To gain a deeper understanding of how Soar and CLIPS support higher levels of decision 
making, we examined how agents that use the operator and operator evaluation strategies 
perform as they are faced with a greater number of potential actions. In the Eaters 
environment, movement can only occur in cardinal directions. To examine situations 
in which more than four actions were under consideration, we modified the agent’s 
precepts (represented by short term memory structures) so that it observed successively 
more directions in which it could move (4, 8, 16, 24, 32, 40). This allowed us to make 
only minimal changes to the agent’s rules at each successive perceptual increase. A 
command from the agent to move in one of the new directions was simply mapped 
onto one of the four cardinal directions in a deterministic manner. Ligure 6 shows the 
performance of the Soar and CLIPS versions of this agent as a function of the average 
number of possible actions that were evaluated during the course of each decision. The 
agents considering between 0-4 candidates used 4 directional percepts, those considering 
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between 4. 1-8, 8.1-16, 16.1-24, 24.1-32 and 32. 1-40 candidates used 8, 16, 24, 32, and 
40 directional percepts respectively. The figure shows only results for agents which use a 
three phase decision process, however, agents using two phase decision process exhibit 
the same behavior, indicating that the performance curve is not an artifact of the decision 
making mechanism. The figure depicts three important phenomena. First, over all values 
of candidates, decision time in Soar remains almost constant, exhibiting only minor 
growth. Second, decision time in CLIPS increases sharply when the number of percepts 
increases. Third, given a constant number of percepts, the growth of decision time in 
CLIPS is also close to constant. Most likely, the discontinuity in CLIPS performance 
is a result of a sensitivity to short term memory structure (e.g., the number slots in an 
unordered fact) which is not apparent in the corresponding Soar agents. This indicates 
a potential problem for building agents in environments that require a high degree of 
perceptual complexity, or those in which the perceptual complexity may be increased 
significantly during the course of development. 
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Fig. 6. Average Time Required Per Decision 



8 Conclusion and Future Work 

We have developed a methodology which can be used to evaluate the degree to which a 
universal architecture supports capabilities required by various agents. We have applied 
this methodology to the Soar and CLIPS productions systems by examining their ability 
to support a variety of decision making mechanisms. Our results serve as a proof of con- 
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cept that this approach can be used to build performance profiles for various architectures 
that can help illustrate the similarities and differences of these architectures. 

Future work in this area requires increasing the breath of our study on two fronts. 
The first of these is to begin by examining a new set of capabilities distinct from those 
used for decision making such as the ability to support learning, use fuzzy logic, or 
make hard real-time guarantees. The second front involves exploring a larger number 
of architectures of progressively greater diversity. As the breadth of these two fronts 
increases, we hope to be able to establish trends and theories not only about the particular 
architectural implementations, but also about their more abstract underpinnings. 
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Abstract. We present a reactive-system view to describe agents for real-world 
applications operating in complex, dynamic, and nondeterministic environments. 
We first identify agent tasks and environments to highlight the desired features 
in agent architectures for the tasks and environments. We then compare various 
architectures according to the identified features. 



1 Introduction 

Agents for real applications often operate in complex, dynamic, and nondeterministic 
environments. Complex environments make it hard for an agent to build or maintain a 
faithful model of the environment. The dynamic nature of environments does not allow 
an agent to fully control the changes in the environment since changes can occur as a 
result of the actions of other agents and of exogenous influences which in turn affect the 
behavior of the agents. The nondeterministic nature of environments makes it impossible 
to predict with certainty the results of actions and the future situations. Agent systems for 
real applications thus need to have the capability to function in worlds with exogenous 
events, other agents, and uncertain effects. 

Our approach to the problem of describing actions of the agents working in such 
complex, unpredictable, and nondeterministic environments is to regard agents as reac- 
tive systems. The key point concerning reactive systems is that they maintain an ongoing 
interaction with the environment, where intermediate outputs of the agent can influence 
subsequent intermediate inputs to the agent. Over the years, this reactive-system view of 
agents has often been used to describe the behavior of agents in dynamic environments 
[ 6 , 13 ], 

In this paper, first, we identify popular agent tasks and the characteristics of the 
environments in which agents need to operate. Second, we evaluate various agent archi- 
tectures designed for the tasks and environments. Finally, we discuss the lessons learned 
from our development of agent architectures. 

2 Tasks and Environments 

In this section, we first lay out assumptions about the tasks and environments in which 
multiple agents operate. From these assumptions, we can draw out the common features 
of these tasks and environments. These features will provide a basis for the comparison of 



N.R. Jennings and Y. Lesperance (Eds.): Intelligent Agents VI, LNAI 1757, pp. 132-146, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 




Reactive-System Approaches to Agent Architectures 133 



a wide variety of agent architectures and rationales for our choice of an agent architecture . 

Tasks and environments relevant to reactive agent systems include: 

Information Services: Dynamic teaming and interactions among information services 
in a digital library. 

Coordinated Unmanned Vehicles: Coordinated, reactive mission achievement by un- 
manned vehicles that are decentrally controlled. 

Operator-in-loop Coordination: Responsive display of situated intelligence that is 
coordinated with the needs and expectations of an operator. 

Automated Monitoring and Controlling Systems: Dynamic movement of time-critical 
tasks among software and human agents in an automated monitoring and controlling 
system. 

Flexible Manufacturing Systems: Coordinated activity without deadlocks or resource 
conflicts in a flexible manufacturing system. 

For these tasks and environments, each application exhibits specific characteristics 

and requires certain capabilities. For the purpose of comparison, we list some common 

features below: 

Real-time execution: The agent needs to be responsive to its environment and predic- 
tably fast enough to act on the changes in the environment. 

Interruptible execution: A dynamically changing environment demands that the agent 
be able to stop its current activity and gracefully switch to other more urgent or higher 
priority activities. 

Real-time execution and interruptible execution are necessary because most tasks 
of interest require that an agent responds fast enough to react to changes in the 
environment and response to changes requires that an agent be able to stop one 
activity and switch to another. 

Multiple foci of attention: The agent needs to maintain multiple foci of attention to 
interact with multiple agents and the environment simultaneously. This ability re- 
quires that an agent’s activity be not only interruptible, but also resumable. Agents 
with this capability are typically implemented with multiple threads of execution. 
The agent saves resumable activities by saving the context of the activities. Ho- 
wever, there exists a tradeoff between saving the context of previous activity and 
reestablishing the context. The balance between these two overheads depends on 
how dynamically the environment is changing. 

Hierarchical plan refinement and revision: The agent can progressively decompose 
high-level tasks into smaller subtasks. High-level tasks are considered independent 
of lower level tasks and the low-level tasks are chosen based on the current high- 
level goals and the current situation. The agent can mix and match among the relative 
strengths of the plans being combined. 

Purposeful behavior (minimizing high-level plan revision): The agent works 

methodically toward its goals in a coherent manner. Purposeful behavior is easy to 
understand by others, including humans, and minimizes high-level plan revisions 
because even if details of how things are done change, the high-level methods stay 
consistent. 

Purposeful behavior and adherence to predefined strategies are desired features in 
domains where others need to recognize the plans of the performing agent or where 
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others are relying on the agent’s expected behavior such as Coordinated Unman- 
ned Vehicles, Operator-in-loop Coordination, Flexible Manufacturing Systems, and 
Information Services. 

Adherence to predefined strategies: The agent behaves in a way that is consistent with 
what others expect (like following doctrine or executing interaction plans exactly as 
the operator told it to). Operator-in-loop Coordination and Automated Monitoring 
and Controlling Systems rely heavily on this feature. 

Task migration: An agent architecture capable of task migration needs the ability to 
shed or accept tasks from other agents. Cooperating agents often need to reallocate 
workload among agents. When a workload imbalance among agents occurs, then 
the appropriate agents may shed or accept workload. This capability is critical in 
Automated Monitoring and Controlling Systems and Information Services. 
Goal-driven and data-driven behavior: The agent’s behavior is triggered not only by 
high-level statements about a state of the world to achieve or activities to perform, 
but also by low-level, perhaps transient data states, such as proximity to danger. 
Coordinated Unmanned Vehicles and Information Services especially demand both 
goal-driven and data-driven behavior. 

Checkpointing and mobility: The agent have the functionality for capturing the run- 
time state of the agent in the middle of execution and functionality for subsequently 
restoring that captured state to its execution state possibly on a different machine. 
This feature can be beneficial especially to the task such as Information Services 
and Automated Monitoring and Controlling Systems. 

Explicit strategy articulation: The agent can explicitly articulate, transfer, and modify 
strategies. The internal representation of strategies needs to be explicitly expressible 
and convertible into a transferable textual representation. The transferred strategies 
also need to be dynamically interpreted and incorporated. This capability is re- 
quired for the agents in Automated Monitoring and Controlling Systems, Flexible 
Manufacturing Systems, and Information Services to coordinate effectively. 
Situation summary and report: The agent can summarize local situations and report 
them to other agents. The reports should provide the receiving agents with a global 
view of the situation and allow them to coordinate with other agents effectively. All 
the tasks we listed above seem to need this capability to coordinate with other agents 
or operators. 

Restrainable reactivity: Coordinating with other agents involves commitment to a 
constrained choice of behaviors. Coordinated Unmanned Vehicles and Flexible Ma- 
nufacturing Systems especially need this ability to constrain reactivity to enable 
coordination. 

Prediction of future activities : The system can predict future activities to enable coor- 
dination. The coordination process needs to recognize and predict interdependencies. 
Coordinated Unmanned Vehicles and Flexible Manufacturing Systems demand this 
feature. 

In Table 1, a cell marked with \] indicates the task in that column generally requires 
the feature specified by that row. The purpose of the table is not to be accurate, but to 
identify general requirements needed for the tasks and environments we are concerned 
with. 




Reactive-System Approaches to Agent Architectures 135 





Coordinated 

Unmanned 

Vehicles 


Operator- 

in-loop 

Coordination 


Automated Flexible 

Monitoring & Manufacturing 
Controlling Systems 

Systems 


Information 

Service 


Real-time 

execution 


s/ 


x/ 


x/ 


V 




Interruptible 

execution 


s/ 


s/ 


x/ 


V 


V 


Multiple 

attentions 


s/ 






V 


V 


Flierarchical 
plan refinement 
& revision 


s/ 




x/ 




V 


Purposeful 

behavior 


x/ 






V 


V 


Adherence to 

predefined 

strategies 


s/ 






V 


V 


Task migration 






y/ 




V 


Goal-driven 
& data-driven 
behavior 


s/ 


x/ 


v 


V 


V 


Checkpointing 
& mobility 






v 




V 


Explicit 

strategy 

articulation 






v 


V 


V 


Situation 
summary 
& report 


s/ 




V 


V 


\/ 


Restrainable 

reactivity 


s/ 


x/ 




V 




Prediction 
of future 
activities 


x/ 






V 





Table 1 . Coordination Tasks and Common Required Features. \J indicates the task generally 
requires the corresponding feature. 
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3 Features and Systems 

Most of the systems described above are Turing complete in that they can each implement 
almost any functionality we can specify. Thus the real issue in comparing these systems 
is not their absolute computational ability or inability, but how “naturally” each system’s 
capabilities match with a specific requirement. For example, as we discuss in this section, 
the Soar architecture has tremendous versatility in the manner in which problems may be 
solved as it is intended to exhibit general, intelligent behavior. However, the generality 
and versatility may not match with a specific requirement, but induce inefficiency for 
certain problems. 

Table 2 summarizes the survey of reactive plan execution systems in terms of their 
relation to the common required features for coordinated reactive plan execution. 

3.1 Adaptive, Intelligent Systems (AIS) 

Adaptive, Intelligent Systems (AIS) [7] are intelligent systems designed to interact with 
other intelligent entities in real time. The AIS agent architecture hierarchically organizes 
component systems for perception, action, and cognition processes. Perception processes 
acquire, abstract, and filter sensed data before sending it to other components. Action 
systems control the execution of external actions on effectors. Perception can influence 
action directly through reflex arcs or through perception-action coordination processes. 
The cognition system interprets perceptions, solves problems, makes plans, and guides 
both perceptual strategies and external action. These processes operate concurrently and 
asynchronously. 

The basic execution cycle consists of updating an agenda of possible actions, sche- 
duling an action, and executing the action. This execution cycle is satisficing, meaning 
that each sub-step in the cycle does not necessarily run to completion (identifying all 
possibilities) but rather may be terminated by some interrupt condition. Rather than 
having distinct reactive and more deliberate components, an AIS brings to bear all the 
information it can within this satisficing reasoning cycle and chooses at the next step 
among whatever information it has available when interrupted. 

The AIS system’s dynamic control planning is its means of preserving coherence. 
Although all the reasoning and interactive processes run in parallel, the current control 
plans are universally accessible, providing some global coherence in light of multiple, 
simultaneous goals. 

The representation used in AIS, also derived from BB1, is a uniform conceptual 
graph structure. These knowledge sources can be organized such that a hierarchical 
decomposition of some task is represented. Knowledge sources posted to the blackboard 
are activated based on trigger conditions. 

The fact that events may occur both concurrently and asynchronously suggest that 
AIS is a parallel architecture. Such parallelism affects both sensing methods and the 
style of control used by an AIS. However, cognitive processes occur serially in the 
architecture. 

An AIS can pursue more than one goal at a time through its use of dynamic control 
plans. The agenda contains operations chosen according to the set of all currently ac- 
tive control plans. The action ultimately chosen depends on certain preferences of the 




Reactive-System Approaches to Agent Architectures 137 





AIS 


ATLANTIS 


ERE 


RAP 


Soar 


Theo 


UM-PRS 


JAM 


Real-time 

execution 


V 


V 


V 


V 


V 


V 


V 


V 


Interruptible 

execution 


V 


V 


V 


V 


V 


? 


V 


V 


Multiple 

attentions 


V 


V 


? 


V 


V 


V 


V 


V 


Hierarchical 
plan refinement 
and revision 


V 


V 


V 


V 


V 


? 


V 


V 


Purposeful 

behavior 


V 


V 


V 


V 


V 


V 


V 


V 


Adherence to 

predefined 

strategies 


V 


? 


V 


V 


V 


? 


V 


V 


Task migration 


? 


? 


? 


? 


? 


? 


V 


V 


Goal-driven 
& data-driven 
behavior 


V 


V 


V 


V 


V 


V 


V 


V 


Checkpointing 
& mobility 
















V 


Explicit 

strategy 

articulation 


? 


? 


? 


? 


? 


V 


V 


V 


Situation 
summary & 
report 


? 


? 


? 


? 


? 


V 


V 


V 


Restrainable 

reactivity 


? 


V 


V 


? 


V 


? 


V 


V 


Prediction 
of future 
activities 


? 


? 


V 


? 


? 


? 


? 


? 



Table 2. Common Required Features and Reactive Systems, y/ indicates the system has demon- 
strated the corresponding feature, and ? indicates that the ability of the system to meet the feature 
is unknown or unlikely. 
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scheduler, but it will make progress on one of the plans, and in turn, toward one of the 
goals. The maintenance of multiple plans keeps the system free to choose actions that 
will make progress toward any of the goals, so it is not limited to one particular plan. 



3.2 ATLANTIS 

ATLANTIS [5] is designed for an agent to be able to operate in a continuously dynamic 
world, complete with unpredictability and imperfections. The design of the ATLANTIS 
architecture was based on the belief that competent behavior in a complex, dynamic envi- 
ronment demands different types of simultaneous activity. Quick reactivity is important 
for dynamism, but planning is necessary to deal with complexity. 

The ATLANTIS architecture explicitly addresses the need for both reactive and 
deliberate behavior in an execution system. ATLANTIS supports three specialized layers, 
operating in parallel, to facilitate the required simultaneous activity. The control layer 
directly reads sensors and sends reactive commands to the effectors based on the readings. 
The stimulus-response mapping is given to it by the sequencing layer; The sequencing 
layer has a higher-level view of robotic goals than the control layer. The sequencing layer 
tells the control layer below it when to start and stop actions, and initiate processes in the 
deliberative layer; and the deliberative layer responds to requests from the sequencing 
layer to perform deliberative time-intensive activities like internal planning. 

The three layers of the ATLANTIS architecture run asynchronously at successively 
longer cycle times (i.e., the control layer has the shortest cycle time and the deliberative 
layer the longest) and at successively higher levels of abstraction. Their asynchronous 
operation allows tasks being performed at different levels to not interfere with each 
other. It also allows the architecture to perform tasks in a time scale acceptable for 
each task. Thus, the control layer could react almost immediately to some change in the 
environment while the deliberative and sequencing layers could continue their processing 
without interruption. 

In addition to the replanning that can be done in the case of failures, the architecture’s 
tasks are interruptible. This allow for a hierarchy of importance levels. 

ATLANTIS supports multiple simultaneous goals through mechanisms of delibera- 
tion which can select a particular goal and plan actions needed to achieve the goals. The 
task queue can hold multiple tasks, several of which can be enabled at the same time. 
So an ATLANTIS agent appears to be able to pursue multiple tasks simultaneously. 

The ATLANTIS architecture, with its hierarchical structure, exhibits both central and 
distributed control. While the overall actions of the system are ultimately controlled by 
the sequencing layer, the asynchronous operation of the architecture gives allows each 
layer a certain amount of autonomy the execution of their appropriate-level tasks. 



3.3 Entropy Reduction Engine (ERE) 

ERE [1] is an architecture for the integration of planning, scheduling, and control. The 
objective for this system is to create a set of software tools for designing and deploying 
integrated planning and scheduling systems that are able to effectively control their en- 
vironments. ERE works toward integrating planning and scheduling and tackling plan 
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execution as a problem of control, mediated by the interaction of three architectural 
modules. The three modules, the Reductor, the Projector, and the Reactor, run asynchro- 
nously with respect to each other. Each module communicates with the others whenever 
it has results. 

The Reductor modules synthesizes appropriate problem-solving strategies for a given 
problem into a graph representation from any problem reduction rules available. The 
Reductor uses this expertise to recursively decompose the given behavioral constraint, 
derive increasingly specific problem-solving strategies, and passe these strategies on to 
the Projector. 

The Projector uses the strategies invented by the Reductor as search control to plan 
and schedule appropriate actions. Once an appropriate behavior is identified, the temporal 
planning results of these strategies, a situated control rule (SCR), is synthesized and 
passed to the Reactor module for execution. SCRs are generalized according to a goal 
regression algorithm and stored permanently in the Reactor to be used again. 

The Reactor executes control rules derived from the Projector’s plans. It also incre- 
mentally compares current sensor values against the behavioral constraint to determine if 
the current goal has been satisfied. The Reactor monitors the actual results of its actions 
and can adjust its actions accordingly. 

Prediction is possible in ERE because it has a causal theory which contains a de- 
scription of control actions that can be taken by the system, exogenous events that are 
outside of the system’s control, and domain constraints that specify those facts which 
can never co-occur. Control actions actions are defined by their preconditions and proba- 
bility distributions of their possible effects. Exogenous events are used by the Projector 
to reason about possible system behavior. Domain constraints are used throughout the 
system to maintain world model consistency. 

The ERE system uses an anytime algorithm to compute a response to its situation. 
Each of the three control modules (Reductor, Projector, and Reactor) continually increase 
the quality of their output as time advances. This property allows the agent to function 
in environments where response time is limited. The Reactor can start execution on 
partially completed plans, which allows for real time operations. This means that the 
ERE system is increasingly rational as it is given more time to react, but can also react 
quickly (with less certainty in its actions) when it needs to. The recognition of a time 
constraint on processing allows ERE to act with bounded rationality. 



3.4 The RAP System 

The RAP system [3,4] is designed for the reactive execution of symbolic plans. A plan is 
assumed to include goals, or tasks, at a variety of different levels of abstraction and the 
RAP system attempts to carry out each task in turn using different methods in different 
situations and dealing with common problems and simple interruptions. 

In the RAP system, a task is described by a RAP which is effectively a context- 
sensitive program for carrying out the task. The RAP can also be thought of as describing 
a variety of plans for achieving the task in different situation. Each RAP is a separate 
entity that pursues a goal, possibly with multiple methods, until that goal is achieved. 
In pursuing a goal, RAPs can process global world model data and execute actions that 
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change the model and the outside world. RAPs can also place new RAPs on the execu- 
tion queue and suspend themselves, implementing sequential and hierarchical control. 
The RAP interpreter, running on a single computer, acts as a dynamic, non-preemptive 
multiprocessing scheduler, choosing the next RAP to run from a queue. 

The RAP system carries out tasks using the following algorithm. First, a task is 
selected for execution and, if it represents a primitive action, it is executed directly, 
otherwise its corresponding RAP is looked up in the library. Next, that RAP’s check 
for success is used as a query to the situation description and, if satisfied, the task is 
considered complete and the next task can be run. However, if the task has not yet 
been satisfied, its method-applicability tests are checked and one of the methods with 
a satisfied test is selected. Finally, the subtasks of the chosen method are queued for 
execution in place of the task being executed, and that task is suspended until the chosen 
method is complete. When all subtasks in the method have been executed, the task 
is reactivated and its completion test is checked again. If all went well, the completion 
condition will now be satisfied and execution can proceed to the next task. If not, method 
selection is repeated and another method is attempted. 

A plan consists of RAP-defined goals, or tasks, at a variety of different levels of 
abstraction and the RAP system attempts to carry out each task in turn using different 
methods in different situations and dealing with common problems and simple interrupti- 
ons. Within the system, execution monitoring becomes an intrinsic part of the execution 
algorithm, and the need for separate replanning on failure disappears. RAPs are not 
just programs that run at execution time, but they are also hierarchical building blocks 
for plan construction. The RAP representation is structured to make a task’s expected 
behavior evident for use in planning as well as in execution. 

3.5 The Soar Architecture 

Soar [9] is intended to exhibit general, intelligent behavior and has been applied to a 
variety of different tasks. Soar’s incorporation of the universal weak methods allows it to 
approach problem solving using a variety of different methods based on the incorporated 
knowledge. This gives it tremendous versatility in the manner in which problems may 
be solved and also may result in different planning strategies as well. This versatility 
also derives from the architectural mechanism of subgoaling which generates all the 
necessary goals for reasoning in a problem space. This mechanism also may be used to 
generate a variety of learning methods even though there is a single learning mechanism. 

Soar represents all tasks as collections of problem spaces. Problem spaces are made 
up of a set of states and operators that manipulate the states. Soar begins work on a task 
by choosing a problem space, then an initial state in the space. 

Soar represents the goal of the task as some final state in the problem space. Soar 
repeats its decision cycle as necessary to move from the initial state to the current state. 
This decision cycle consists of a number of knowledge retrievals, corresponding to the 
firing of (possibly many) productions. Productions are the uniform unit of knowledge 
in Soar. The knowledge retrieval cycles are completely reactive, matching information 
in internal memory and input according to the conditions specified by productions. 
Knowledge retrievals continue until quiescence, when no new production matches. At 
this point, preference knowledge about states and operators is compared. If a new state 
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or operator is uniquely preferred, then the decision procedure installs that item into the 
operator-state context stack. Items in the context are persistent over following decision 
cycles and can be removed only by explicit termination knowledge (for operators) or 
architectural impasse resolution (for states). 

Impasses are generated whenever a unique decision can not be made about some 
context slot. The architecture responds to impasses by establishing an architectural goal 
to resolve the impasse. When the impasse is resolved, an explanation-based learning me- 
chanism, chunking, compiles the results of the impasse resolution into a rule applicable 
at the level of resolution. 

Soar productions encode knowledge about what decisions to make in different situa- 
tions. To make a decision (choosing what goal to pursue, operator to apply, etc.), Soar 
tries to match and fire all of its productions repeatedly, until no new productions match. 
The decision is then made based on all the knowledge retrieved from the production 
firings. If the productions do not provide enough knowledge to make a decision, the sy- 
stem recursively subgoals to solve the problem of “making the decision.” The integrated 
Soar learning mechanism (chunking) builds new productions that summarize the search 
performed to solve problems. 

From a predictability perspective, Soar’s flexible decision-making approach has the 
disadvantage that arbitrarily large amounts of subgoaling and production matching may 
occur. To avoid subgoaling. Soar encodes one type of reactive knowledge as produc- 
tions that indicate particular operators must be selected in a given situation. However, 
this reaction technique still incorporates the uncertain delay associated with firing all 
productions until quiescence and then making the decision to implement the chosen 
operator. Even if the match time was a known constant, the process of firing productions 
repeatedly until quiescence is still an uncertain computation subject to scaling with the 
size of the knowledge base. 



3.6 Theo 

Theo-Agent [14] integrates planning, reacting and knowledge compilation. In Theo, 
learning is interleaved with general problem solving and self-reflection. This allows 
the architecture to react appropriately when the environment presents opportunities for 
learning or self-reflection. 

Theo- Agents have stimulus response (S-R) rules. They give Theo- Agents the ability 
to rapidly respond to sensor data to do navigational and robotic tasks in a complex and 
dynamic environment approximating full reactivity when there is a complete set of S-R 
rules for the given domain. How often the rules are checked to see if they are applicable 
is a function of the sensor update policy. Sensors may either be updated often or on an 
as-needed basis. 

The architecture reacts whenever possible, deferring to decomposition only when 
necessary. Results of decomposition are compiled into reactive stimulus-response (S- 
R) rules which are added incrementally to the agent’s knowledge base. Internal drives 
(goals) and the world state are used to determine an action during each architecture 
cycle. When actions are chosen after planning, an EBL algorithm compiles the planning 
into a rule which preserves the correctness of the planning inferences. 
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Theo stores all knowledge in a frame representation. This uniform representation 
allows all knowledge to be accessed and manipulated. An impasse for Theo’s frame- 
based memory system is the existence of a slot without a value. Theo’s job is to find a 
value for slots that do not have one. Because it works where it needs answers, Theo has a 
very focused behavior. Because Theo explicitly works where it does not have knowledge 
necessary to solve the immediate problem, Theo is impasse-driven. 

Since goals are based on drives in Theo-Agent, multiple goals may be activated 
simultaneously. Based on prioritization rules, one of the goals may be selected as the 
one to pursue. It is not clear whether the priorities can change dynamically and allow 
essentially interleaved goal pursuit. It is also not clear whether planning and execution 
occur simultaneously; it seems that they do not, because planning is only done in response 
to the inability to react. 

Self-reflection is an extension of the system’s use of meta-knowledge in order to 
derive certainty of conclusion, costs of actions, etc. Self-reflection allows Theo the 
ability to determine the best action or method to use in any situation. 

Theo-Agent lacks any notion of persistence. All active drives and beliefs are comple- 
tely dependent upon sensor input. Therefore, Theo-Agent can not reason about anything 
not directly sensed in its local environment. 

3.7 UM-PRS 

UM-PRS[ 12] is an implementation of the conceptual framework provided by the Proce- 
dural Reasoning System (PRS) [6], UM-PRS is completely written in C++, particularly 
to meet the need for real-time requirements of real-world applications, and thoroughly 
tested over years of applications. 

UM-PRS maintains a library of alternative procedures for accomplishing goals and 
selectively invokes and elaborates procedures based on the evolving context. In contrast 
to the traditional deliberative planning systems, UM-PRS continuously tests its decisions 
against its changing knowledge about the world, and can redirect the choices of actions 
dynamically while remaining purposeful to the extent allowed by the unexpected changes 
to the environment. 

The intention structure in UM-PRS maintains information related to the runtime 
state of progress made toward the system’s top-level goals. The agent will typically have 
more than one of these top-level goals, and each of these goals may, during execution, 
invoke subgoals to achieve lower-level goals. With the current implementation, a goal 
that is being pursued may be interrupted by a higher priority goal and then later resumed 
(if possible). 

When a goal gets suspended, due to a higher level goal becoming applicable, the 
current state of execution of the current goal is stored. When the suspended goal becomes 
the highest priority goal again, it is reactivated. The reactivation process is identical to 
normal execution. However, due to the possibility that the world model has been changed 
during the pursuit of other goals, execution resumes at the place where it was suspended 
only when the contexts of the resumed top-level goal and its subgoals are all still valid. 

The UM-PRS interpreter controls the execution of the entire system. With this im- 
plementation, the interpreter facilitates switching to more important goals according to 
the situation. The UM-PRS interpreter can exhibit the following behaviors: 
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- Incremental elaboration to identify appropriate actions. 

- Recovery from a failed context. 

- Suspension of one goal and pursuit of another. 

- Interruption by new goals. 

- Refocus due to a change of context. 

3.8 JAM 

JAM [8] is an intelligent agent architecture that combines the best aspects of several 
leading-edge intelligent agent frameworks, including PRS, UM-PRS and the Structured 
Circuit Semantics (SCS) representation of Lee [10]. JAM inherits all of the UM-PRS 
features and refines them. 



Refined Semantics An agent’s goals are divided into two categories, top-level goals 
and subgoals, as in UM-PRS. JAM, however, provides refined subgoaling actions as 
follows: 

ACHIEVE: An achieve action causes the agent to establish a goal achievement subgoal 
for the currently executing plan. This then triggers the agent to search for plans in 
the plan library that can satisfy the goal given the current context. 

PERFORM: The agent checks to see whether the subgoal has already been accomplis- 
hed. Only if the goal has not been accomplished, the plan does subgoal. 

The agent continually monitors for goal achievement. Typically, the plan selected 
for the subgoal will be the means by which the subgoal is accomplished. However, 
if the agent detects (opportunistic) accomplishment of the goal (perhaps by another 
agent), it will consider the subgoal action successful and discontinue execution of 
the plan established to achieve the subgoal. 

MAINTAIN: A maintain goal indicates that the specified goal must be reattained if it 
ever becomes unsatisfied. A maintain goal is very similar to an achieve goal except 
that a maintain goal is never removed from the agent’s goal list automatically. The 
only way that a maintain goal is removed from the agent’s intention structure is to 
perform an explicit unpost action. 

WAIT: The wait action causes plan execution to pause until the specified goal is achieved 
or the specified action returns successfully. Execution of the plan continues in place, 
with the agent checking the goal or action every cycle through the interpreter. This 
action never fails. 

A plan in JAM defines a procedural specification for accomplishing a goal. Its ap- 
plicability is limited to a particular goal or data-driven conclusion, and may be further 
constrained to a certain precondition and maintained context. 

Plan Precondition: The optional precondition field specifies the initial conditions that 
must be met before the plan should be considered for execution. 

Plan Context: A plan’s context specifies one or more expressions that describe the 
conditions under which the plan will be useful throughout the duration of plan 
execution. 
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Goal-driven and Data-driven Behavior JAM agents can exhibit both goal-driven and 
data-driven behavior. 

Plan Goal: Goal-driven behavior is specified by including a goal field in a plan. This 
field’s contents specify the goal or activity that successful execution of the plan’s 
procedural body will accomplish. During execution, the interpreter matches this 
goal expression against the agent’s top-level goals. If the plan’s goal specification 
matches and if the plan’s precondition and context expressions pass, the plan may 
then be instantiated and intended. 

Plan Conclude: Data-driven behavior is indicated by using a conclude field in a plan. 
This specifies a World Model relation that should be monitored for change. If the 
given World Model relation changes in some way (i.e., it is asserted or updated), 
the agent’s interpreter considers the plan for execution. If the plan’s precondition 
and context expressions pass, the plan may then be instantiated (i.e., have values 
from the current situation assigned to plan variables) and intended (i.e., added to the 
agent’s intention structure). 



Checkpointing and Mobility JAM implements functionality for capturing the runtime 
state of the agent in the middle of execution and functionality for subsequently restoring 
that captured state to its execution state. 

One use of this functionality is for periodically saving the agent’s state so that it 
can be restored in case the agent fails unexpectedly. Another use of this functionality 
is to implement agent mobility, where the agent migrates from one computer platform 
to another. A third possible use of this functionality is to clone an agent by creating a 
checkpoint and restoring it execution state without terminating the original agent. 



4 Discussion 

As a reactive-system agent architecture, UM-PRS satisfies most of the features requi- 
ring real-time interruptible execution, multiple foci of attention. It also naturally supports 
hierarchical plan refinement and revision, purposeful behavior, adherence to predefined 
strategies. Especially, in the SSA project [2], UM-PRS demonstrated effective task mi- 
gration capability. JAM inherits all of the above capabilities and supports additional 
explicit data-driven behavior, checkpointing and mobility. 

The explicit strategy articulation capability is strengthened by incorporating the SCS 
execution semantics [ 10] into JAM and is also supported partly by JAM’S refined goal 
actions and plan conditions. 

Other capabilities, situation summary/report and restrained reactivity are especially 
required for coordinated agent plan execution and are being studied in the context of 
explicit specification of execution semantics in the agent plan [11], 

Note that it is not accidental that UM-PRS and JAM have more features than other 
systems as shown in Table 2, since UM-PRS and its successor, JAM, have been evolved 
over time to meet those requirements used for the comparison in this paper. 

Our approach to the evaluation of agent architectures is similar to Muller’s [15] 
in that both have basis on tasks and applications. However, Miiller’s approach focuses 
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on classification of agent architectures and decision to choose from the existing agent 
architectures for a specific application. Our approach, on the other hand, emphasizes the 
desired features for the application and thus provides guidelines for the development of 
agent architectures as well. 

Wallace and Laird’s approach [16] is completely different in that it emphasizes 
quantitative as well as qualitative measures between pairs of agent architectures. Howe- 
ver, their current measures are applicable only to a limited set of agent architectures and 
capabilities. 



5 Conclusion 

In this paper, we surveyed various reactive-system approaches to agent architectures and 
performed comparative analysis of the architectures based on our experience on building 
agent architectures. We first identified the common tasks and environments in which 
agents need to operate and then characterized the desired features of agent architectures. 
These characteristics are then checked across some selected agent architectures. We also 
showed how progressively our agent architectures has been refined and extended to meet 
the need for the identified features. Although our work is based on a limited range of 
tasks and a bounded number of reactive agent architectures, we believe that our work 
can be a stepping stone toward the evaluation of agent architectures. 
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Abstract. In the RETSINA multi-agent system, each agent is provided with an 
internal planning component — HITaP. Each agent, using its internal planner, for- 
mulates detailed plans and executes them to achieve local and global goals. Kno- 
wledge of the domain is distributed among the agents, therefore each agent has 
only partial knowledge of the state of the world. Furthermore, the domain changes 
dynamically, therefore the knowledge available might become obsolete. 

To deal with these issues, each agent’s planner allows it to interleave planning and 
execution of information gathering actions, to overcome its partial knowledge of 
the domain and acquire information needed to complete and execute its plans. In- 
formation necessary for an agent’s local plan can be acquired through cooperation 
by the local planner firing queries to other agents and monitoring for their results. 
In addition, the local planner deals with the dynamism of the domain by monito- 
ring it to detect changes that can affect plan construction and execution. Teams 
of agents, each of which incorporates a local RETSINA planner have been imple- 
mented. These agents cooperate to solve problems in different domains that range 
from portfolio management to command and control decision support systems 1 . 



1 Introduction 

We are developing the RETSINA 2 Multi- Agent System (MAS), [17] a system in which 
agents exchange services, goals and information with other agents and human users. 
RETSINA agents are deployed in dynamic environments, they have only partial know- 
ledge of the world in which they operate, but they can take advantage of the intrinsic 
distribution of MAS, and gather knowledge that is distributed across all agents in the 
system. 

To satisfy their goals, agents need to formulate and execute plans. However the 
distribution of capabilities and limited information usually prohibit the creation of a 
comprehensive plan within the single agent, rather agents are likely to subcontract servi- 
ces to other agents in the system, leading to the implicit construction of a distributed plan. 
Yet distributed planning brings about other problems as for instance conflicts between 
the local plans constructed by different agents. Moreover, a subcontracting mechanism 
is required, following such a mechanism the agent can suspend its planning or execution 
while waiting for other agents to complete their plans and provide results. 

1 This research has been sponsored in part by the office of Naval Research grant N-00014-96- 
16-1-1222 by DARPA grant F-30602-98-2-0138 

2 REusable Task Structure based Intelligent Network Agents. 



N.R. Jennings and Y. Lesperance (Eds.): Intelligent Agents VI, LNAI 1757, pp. 147-161, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 
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These unique requirements of planning within an open dynamic MAS (and in parti- 
cular in RETSINA) pose difficulties in the use of existing planners. Although research 
on planning has dealt with most of the problems listed above, no planner addresses 
all the problems at once. Planners deal with partial knowledge by either planning for 
contingencies (e.g., [13]) or gathering information during planning (e.g., [7,5]). Other 
planners deal with uncertainty in the domain by using probabilistic models [8] or by 
reacting to the environment in which they operate [9,4]. Open, dynamic multi-agent 
systems require that both partial knowledge and dynamism be resolved simultaneously. 

In this paper we present HITaP 3 , a planner that is part of the architecture of RETSINA 
agents. HITaP assumes that agents have only partial knowledge of the domain and it 
supports interleaving of planning and execution of information gathering actions. In 
addition, HITaP supports two ways of information gathering: by direct inspection of 
the domain and by querying other agents. The second modality takes advantage of the 
intrinsic parallelism in the multi-agent system by having multiple agents working on 
the accomplishment of a common goal and allowing each agent to develop local plans 
while waiting for other agents to compute requested information. Finally, HITaP deals 
with dynamism in he domain by monitoring changes in the environment and predicting 
their effect on the plan. 



2 The RETSINA Architecture 

RETSINA is an open multi-agent system that provides infrastructure for different types 
of deliberative, goal directed agents. In this sense, the architecture of RETSINA agents 
[17] exhibits some of the ideas of BDI agents [ 15,11]. RETSINA agents are composed 
of four autonomous functional modules: a communicator, a planner, a scheduler and 
an execution monitor. The communicator module receives requests from users or other 
agents in KQML format and transforms these requests into goals. It also sends out 
requests and replies. The planner module transforms goals into plans that solve those 
goals. Executable actions in the plans are scheduled for execution by the scheduler 
module. Execution of the actions and monitoring of this execution is performed by the 
execution monitor module. The four modules of a RETSINA agent are implemented 
as autonomous threads of control to allow concurrent planning and actions’ scheduling 
and execution. Furthermore, actions are also executed as separate threads and can run 
concurrently. In general, concurrency between actions is not virtual. Rather, since some 
actions require that the agent ask other agents for services, and since these agents are 
running on remote hosts, actual parallelism is enabled. 

The following data stores are part of the architecture of each individual RETSINA 
agent and are used by HITaP. Their role in the overall architecture of the agent is displayed 
in Figure 1 . 

- The objective-DB is a dynamic store that holds the objectives of the agent of which 
it is a component. An objective-DB implements a queue with priorities, i.e., the 
objective with the highest priority on the queue is handled first by the planner. New 

3 Hierarchical Task Planner 
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Fig. 1 . The RETSINA planning architecture. 



objectives are inserted in the queue by the communicator and by the planner when 
complex objectives are decomposed in simpler objectives. 

- The task-DB is a dynamic data store that holds the plan. Tasks are added by the 
planner when it recognizes that they contribute to the achievement of the objectives. 
Tasks are removed by the scheduler when they are ready for execution. 

- The task schema library is a static data store that holds tasks schemas. These are 
used by the planner for task instantiation. 

- The task reduction library is a static data store that holds reductions of tasks. These 
are used by the planner for task decomposition. 

- The belief s-DB is a dynamic data store that maintains the agent’s knowledge of the 
domain in which the plan will be executed. The planner uses the beliefs-DB during 
planning as a source of facts that affect its planning decisions. Actions may affect 
the beliefs-DB by changing facts in the domain. 

The RETSINA agent architecture has been inspiration for DECAF [6]. While there 
are little differences between the two architectures: DECAF adds an initialization module 
that is left implicit in RETSINA. The real difference between the two systems lays in the 
tools implemented for DECAF that facilitate agent creation, while the research in the 
RETSINA project lead toward extending the modules toward the construction of more 
complex agents. 
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Hierarchical Task Network (HTN) 




Fig. 2. A Hierarchical Task Network 



3 The Planner Module 

The RETSINA Planner represents tasks using the Hierarchical Task Network (HTN) 
formalism [3]. Figure 2 displays the structure of an HTN. It consists of nodes that 
represent tasks and two types of edges. Reduction links describe the decomposition of a 
high-level tasks to subtasks (a tree structure). These links are used to select the tasks that 
belong to the decomposition of the parent task. Provision/outcome links, the other type 
of edges, are used for value propagation between task-nodes. Provision/outcome links 
describe how the result of one task is propagated to other tasks. For instance in Figure 3, 
the task T represents the act of buying a product. T may decompose to finding the price 
(Ti) and performing the transaction (7 ' 2 ). The reduction requires that T) is executed first 
to propagate the price outcome to T 2 . 




Fig. 3. An example of task decomposition 



Formally, the tuple < A, C , 7 Z, B,0,T > describes a problem for the RETSINA 
Planner, where A is a set of actions (primitive tasks) that the agent can perform directly, 
C is a set of complex tasks that are implemented by the composition of actions and other 
complex tasks. 1Z is a set of reduction schemas, where each reduction schema provides 
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details on how complex tasks are implemented. A reduction schema for a complex 
task C £ C specifies the list of tasks that realizes C, and how their preconditions and 
effects are related to C’s preconditions and effects. In general, there may be several 
reduction schemas in 1Z for the same task in C. Each reduction schema corresponds to 
one implementation of this task. B contains the set of beliefs of the agent. B is not a 
static list, but it changes depending on the results of the actions of the agent and the 
information gathered. O is the objective-DB, which holds objectives not yet achieved 
by the agent. The goal of the planner is to achieve all the objectives in this list. T is 
the task-DB which, by holding the tasks already added to the plan, describes the plan 
constructed by the agent. 



RETSINA-Planner (goal) 

init-plans <— make initial plans. 
partial-plans <— init-plan. 

While partial-plans is not empty do: 

choose a partial plan P from partial-plans 
If ( P has no flaws) 
then return P 
else do: 

remove a flaw f from P’s objective-DB. 
partial-plans «— refinements of f in P 
return failure 



Fig. 4. The Basic HITaP Planning Algorithm 



The detailed planning algorithm is described in Figure 4. It starts from an initial 
set of plans ( init-plans ) that provide alternative hypothesis of solutions of the original 
goal. Initial plans are constructed by matching tasks to the initial objectives. The planner 
proceeds by selecting a partial plan P and a flaw/from P's objective-DB, to generate a 
new partial plan for each possible solution of f. This process is repeated until the planner 
generates a plan with an empty objective-DB. The planner fails if the list of partial plans 
empties before a solution plan is found. 

The resulting plan is a tree of partially ordered tasks in which whose the leaf nodes 
are actions in A, while the internal nodes are complex tasks in C. At execution time, 
actions are scheduled for execution and eventually they are mapped to methods which in 
turn are executed by the agent’s execution monitor. Complex tasks in the plan are used 
by the scheduler to synchronize the execution of primitive tasks as well as connection of 
the outcomes of computed tasks to the preconditions of tasks that were not yet executed. 



3.1 Flaw refinement 

The flaw refinement algorithm is shown in Figure 5. The RETSINA Planner allows three 
different types of flaws: task-reduction flaws, suspension flaws, and execution flaws. 
Task-reduction flaws are associated with unreduced complex tasks in the task-DB. They 
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are used to signal which tasks in the current partial plan should be reduced. Once a 
reduction flaw is selected, the planner applies all task reduction schemas in 1Z associated 
with the task, generating a new partial plan in correspondence to each application of a 
schema. As a result, all the subtasks listed in the reduction schema are added to the partial- 
plan’s task-DB T. Task reduction triggers the evaluation of constraints and estimators 
that are associated with the task being reduced, which in turn could trigger the execution 
of actions that inspect the environment and provide information that is not present in B. 



refinements of f in P 

if f is a reduction flaw then 

t <r- the task corresponding to f 
evaluate estimators and constraints of t 
for each reduction r of t do 
new-plans <— apply r to P 
if f is a suspension flaw then 
add f to the flaws of P 
new-plans add P 
if f is an execution flaw then 

a <— the action corresponding to f 
if a completed successfully 
new-plans add P 
if a failed 

new-plans <— nil 
if a still running 

add f to the flaws of P 
new-plans add P 
Return new-plans 



Fig. 5. The Refinement Algorithm 



Execution flaws are used to monitor the execution of actions while planning. An exe- 
cution flaw is created and added to the objective-DB O whenever an action is created. 
Execution flaws are removed from O only when the corresponding action terminates. 
Their solution depends on the termination of the action: if the action terminates suc- 
cessfully, then the flaw is simply removed from the list of flaws and no action is taken; 
otherwise, when the execution fails or times out, the partial plan also fails and the planner 
backtracks. 

Suspension flaws are used to signal that the partial plan contains unreduced complex 
tasks whose solution depends on data that is not currently available to the agent. Suspen- 
sion flaws are delayed and transformed into reduction flaws only after the occurrence of 
an unsuspending event, such as the successful completion of the execution of an action. 
Unsuspending events provide the data that the planner was waiting for, and they allow 
the completion of the reduction of the complex task. 
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4 Task and Plan Representation 

A task is a tuple < A f, V ar , V> par , V ro , Out , C , £ > where Af is a unique identifier of the 
task; V ar , D par and "Pro are different types of preconditions as discussed below, O ut is 
the set of outcomes of the task, C is a set of constraints that should hold either before, 
after or during the execution of the task, and £ is a set of estimators used by the planner to 
predict the effects of the task on some variables. An example of task is shown in Figure 
6. In this example, Af = Buy Product, V ar = {Balance}, V ro = {Expenses}, O u t = 
{ PurchaseDone},C = {Balance > 0},and£ = {Balance = Balance— Expenses}. 



Buy Product 

O Balance 

O Expenses PurchaseDone Q 

E; Balance=Balance-Expenses 
C: Balance>ProductPrice 



Fig. 6. The Buy Product task has a provision and a parameter (on the left), an outcome (on the 
right), and an estimator and a constraint (denoted E and C, at the bottom). 



4.1 Estimators and Constraints 

Estimators are used to predict the value of a variable after the task is performed. Con- 
straints are used to limit the values of variables to a specified range, when the plan 
should fail at execution time if such range is violated. We use constraints and estimators 
to evaluate the amount of resources needed to perform a task and to verify whether these 
resources are available to the agent. 

4.2 Preconditions and Outcomes 

HITaP’s task representation distinguishes between three types of preconditions: Para- 
meters, Dynamic Parameters and Provisions. Parameters contain beliefs, as such they 
represent conditions that the agent expects to be true in its environment. Dynamic Pa- 
rameters are parameters that are modified by the execution of a task in the plan. For 
example, in a plan which involves moving vehicles, origin and destination are parame- 
ters, while the location of the vehicles and the amount of fuel are dynamic parameters 
since they are modified by the tasks in the plan. Parameters and dynamic parameters are 
used by the planner to verify whether the plan constructed is valid. This, in turn, means 
that they are used to evaluate the estimators and to verify all constraints. 
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Provisions are the third type of preconditions, they are “execution time” precondi- 
tions, in the sense that a task can be executed only when all its provisions are satisfied. 
Provisions work combined with outcomes: when an action is executed, its outcomes are 
established. The results of the outcomes are transmitted through provision links to the 
provisions of other tasks in the plan establishing the execution conditions of those tasks. 
For example, the outcome Path of the task Select Path in Figure 8 is linked to the the 
provision Path in Ask Fuel Consumption. The task Ask Fuel Consumption can be 
executed only after the outcome of Select Path is successfully established. 

The plan representation described above has two advantages: it allows a natural 
representation of contingecy plans, and it allows the representation of plans in which 
two concurrent steps use the same resource. Contingency plans are created by using 
different outcomes for the different contingencies. For instance, tasks commonly have 
at least two outcomes: one for success, the other for failure. The failure of the execution 
of a task, triggers the establishment of the failure outcome, which triggers the execution 
of tasks in the plan that handle the failure, these tasks would not be executed otherwise. 
The use of temporal constraints to define the control flow of the plan, rather then in 
the definition of causal constraints [10], allows the construction of plans that cannot be 
constructed otherwise. Consider the following 2 steps of a plan for vehicle movement: 
RunAirConditioner and GoToX, both steps consume fuel. A causal-link representation 
adds a link between the steps, thus imposing an order between them. As a result, they 
have to be executed sequentially, which is like to say that the agent moves first in a 
hot environment and only when it arrives it runs the air conditioner to cool down, or 
it chill out first, and then rush hoping that the temperature stays comfortable. HITaP 
can generate a plan that overcomes this problem: it uses estimators to evaluate the fuel 
needed for both actions and constraints to make sure that there is enough fuel for both 
of them. The temporal relation between the steps does not matter. 

4.3 Task Reduction Schemas 

Task reduction schemas are used to describe how complex tasks are implemented by 
composition of other tasks. The tuple < Aftask,Tu s t,Zunks, blinks,® links > formally 
represents a reduction schema. A Cta.sk is a unique identifier of the reduced task r; Tu s t is 
a set of primitive and complex tasks that define a method to implement t. T.i, nks contains 
inheritance links that connect t’s provisions to the provisions of the children tasks in Tu s t- 
These links specify how the values of the provisions of the parent task t become values 
of the provisions of its children tasks (the members of Tu s t)- Punks specifies provision 
links between sibling tasks in the decomposition. These links are used to maintain a 
temporal order between tasks in the reduction. Ou n k s is the set of outcome propagation 
links that connect the outcomes of the children tasks in Tu s t to the outcomes of the 
parent task. These links specify how the outcomes of the parent task t are affected by 
the outcomes of its children tasks. 

5 Interleaving Planning and Information Gathering 

The evaluation of estimators and constraints should be computed before the plan is 
completed. Flowever when an estimator needs the value of a provision 7r that is not yet 
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set, the agent can either use its own sensors to find this information or query other agents 
for the missing information. In either case, the completion of the plan is deferred until 
the value of n is provided. 

The execution of information actions during planning is controlled by the suspension 
algorithm (Figure 7). Since estimators and constraints are evaluated in the task reduction 
step, the planner records that the reduction of a task t is suspended by adding a new task- 
reduction flaw/for t, marked as suspended. The flaw/records that t is not reduced yet 
and the completion of the plan is deferred. Then, the planner looks for a primitive task 
fjr in the plan, that if executed would set 7 r. t ^ is found by tracking backward inheritance 
links and provision links that end in 7r. The task t v is then scheduled for execution and a 
new execution flaw e to monitor the outcome of is added to the list of plan P’s flaws. 



suspension of t in P 

/<— task-reduction flaw for t 
add f to the flaws of P 
set / as suspended 

set unsuspension trigger to a provision n 
Find task that sets n 
Schedule t n for execution 
e execution flaw for t n 
add e to the flaws of P 

Fig. 7. The Suspension Algorithm 



As described above, the flaws /and e are not removed from the list of flaws until t n 
completes its execution. The completion of A removes the suspension on / which in 
turns allows r’s estimators and constraints to be evaluated and t to be reducted. 

The use of suspension and monitoring flaws to control action execution has important 
consequences. First and foremost, it closely ties action execution and planning: since a 
plan is not completed until all flaws are resolved. The use of suspension and execution 
flaws guarantees that all scheduled actions are successfully executed before the plan 
is considered a solution of the problem. In addition, if an executing action fails, the 
failure will be detected as soon as the planner refines the corresponding execution flaw. 
Furthermore, using flaws to suspend and monitor action execution allows the planner to 
work on other parts of the plan while it waits for the completion of information gathering 
actions. 

Finally, a few things have to be noted. First, rather then executing actions while 
planning, the agent could look for a plan that uses different information, therefore eli- 
minating the need for executing any action. It is up to the agent (or to its programmer) to 
implement an heuristic function that chooses the plans to expand (see HITaP planning 
algorithm in figure 4. ) Once the heuristic function selects which plan to work on next, the 
planner is responsible only for the solution of flaws in the plan. Second, the suspensions 
algorithm described above can, in principle, be applied to actions that do not gather 
information. There is a subtile problem though, asking an agent to gather information 
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Fig. 8. An example of a task reduction schema 



is likely to leave the state of the world unchanged, while asking another agent to do 
something is guaranteed to modify the world. These changes are outside the scope of the 
plan constructed (they are planned locally by the other agent) and therefore they could 
invalidate the plan constructed so far. 



6 Example 

In this example we show how the planner is used in RETSINA agents organized in a 
multiagent system that supports joint mission planning 4 . In this scenario, three army 
commanders discuss a rendezvous location for their platoons. Then, each commander 
constructs its own plan, assisted by a planningAgent that finds a route, taking into 
account fuel limitations, ground and weather conditions. The system includes, among 
others, the following agents: the FuelExpertAgent that computes how much fuel is 
needed to accomplish a mission; the weatherAgent that provides weather forecasts, 
and a Matchmaker that matches service provider agents with consumer agents. 

Each commander asks its planningAgent to find a route to the rendezvous point. 
PlanningAgents transform the request to an objective that is achieved using the reduc- 
tion schema shown in Figure 8. Following the reduction schema, the plan adopted is 
SelectPath, AskFuelConsumption, and Move. Since the three actions can be further 
reduced, the planner adds three reduction objectives to the ObjectiveDB. 

Following the reduction algorithm shown in Figure 5, reduction objectives trigger 
the evaluation of the estimators associated with the action being reduced. The estimator 
associated with Move depends on the value of the unknown provision Consumption 
that can be set only by executing AskFuelConsumption. The execution of a step while 
planning is controlled by the suspension algorithm described in Figure 7. The planner 
first suspends the reduction of Move until the provision Consumption is set; then it 
schedules AskFuelConsumption for execution. Since AskFuelConsumption needs the 

4 For more information on the system see http://www.cs.cmu.edu/softagents/muri.html 
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Fig. 9. An example of a task reduction schema 



value of Path, SelectPath is also scheduled for execution. Select Path is executed 
directly by the agent, while AskFuelConsumption is a request from information to the 
FuelExpertAgent. From the point of view of the planningAgent, there is no difference 
between the two tasks: a request to another agent is an action as any other. 

The agent FuelExpertAgent computes the expected fuel consumption following de- 
composition schema shown in Figure 9. The result is a plan that contains three actions: 
Survey Terrain, ForecastWeather, and ComputeFuelUsage. During the execution of 
the action ForecastWeather the FuelExpertAgent sends a query to the Weather Agent 
that computes a weather forecast. The results returned by the WeatherAgent are passed 
back to the FuelExpertAgent that uses them to execute ComputeFuelUsage 5 . Finally, 
the FuelExpertAgent reports the result to the planningAgent. The execution of Ask- 
FuelConsumption sets the value of Consumption, that releases the suspension of the 
reduction of the action Move which is finally reduced. 

One aspect is important to stress: the computation of the fuel consumption is transpa- 
rent to the planningAgent, which is unaware that the FuelExpertAgent needs additional 
information. Agents do not need to model how other agents solve problems or what they 
require to solve a problem. 

Above, we assumed that the planningAgent knows that the FuelExpertAgent is part 
of the system. This assumption is too strong; RETSINA is an open system which agents 
join and leave dynamically. Agents joining the system advertise and unadvertise with a 
Matchmaker [2]. The advertisement is a declaration of what tasks the agent can perform; 
whenever an agent wants to outsource parts of the computation it asks the Matchmaker, 
for contact information of agents that can perform the task. 

Figure 10 shows the decomposition of the action AskFuelConsumption, it contains 
two sub-actions: findFuelExpert that is a request to the Matchmaker for a reference to 
a fuel expert agent. The action AskAgent is a request addressing the fuel expert agent 
to provide the expected consumption. Since Matchmakers are agents, a request to a 

5 Weather information is needed because tanks have different fuel consumption rates when they 
travel on different terrains, e.g. dry soil vs. wet soil 
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Matchmaker is solved as described above and it does not require additional computational 
machinery. 



7 Monitoring Conditions during Planning 

The construction of a plan and the scheduling and execution of the plan’s actions take 
time. During this time the environment may change, invalidating the plan. In the example 
above, any change in the expected fuel consumption may lead the platoon to run out of 
fuel before its destination is reached. 

RETSINA agents implement Rationale Based Monitors [19] to detect changes in 
the environment that are relevant to the plan. Specifically, constraints are the only mean 
for the planner to evaluate the validity of a plan: when a constraint fails, the plan is no 
longer valid. RETSINA agents monitor the value of parameters and provisions that are 
arguments of constraints in the plan; when one of the monitors detects a change, the 
agent re-evaluates its constraints to verify whether the plan is still valid. 

Agents decide which action to monitor at planning time. When constraints are added 
to the plan, the agent looks for the actions set the arguments of the constraint, then 
it transforms these actions into monitors: requests of information are transformed into 
requests to monitor, while sensing actions become monitoring actions. Monitors are 
implemented as periodic information gathering actions that iterate until they are stopped 
by the agent [21], While virtually every AI planner uses exclusively “single shot” actions 
that are removed from the plan as soon as completed, RETSINA’s periodic actions, are 
not removed from the plan. Rather, at the end of their execution, they are reinstantiated 
by the scheduler to run again. Monitors are stopped by the agent during plan execution 
when they do not have any associated constraint. 

The reaction to a change in the environment depends on the state of the plan. If the 
domain change violates a constraint in a partial plan that has outstanding flaws, then the 
partial plan is no longer expanded because it is not valid and the planner backtracks. 
If, instead, the change violates a constraint in an action that is scheduled for execution, 







A Planning Component for RETSINA Agents 159 



then the agent abandons the plan and constructs a new plan to fulfill the goal. Finally, 
violations of constraints of actions already under execution are not considered, the agent 
waits for the success or failure of the action. 



8 Related Work 



HITaP has some similarities with Knoblock’s Sage [7], mainly in the concurrency of 
planning and information gathering and the close connection between the planner and 
the execution monitor through monitoring flaws. Nevertheless, the two planners differ 
in many important respects: while Sage is a partial order planner that extends UCPOP 
[12], HITaP plans by task reduction rather than from first principles; in addition, HITaP 
extend Sage's functionalities through the constant monitoring of the information gathered 
and replanning when needed. Other planners relax STRIPS’ omniscience assumption 
by interleaving planning and execution of information gathering actions, e.g., A7/[5]. 
Our approach is different from theirs. First, as in the case of Sage above, we use a 
different planning paradigm: HTN instead of SNLP In addition we cannot assume the 
Local Close World Assumption because the information gathered might change while 
planning. Moreover, our planner supports a coarse description of information sources 
such as other agents. Specifically, the planner should know what they provide but not 
what their requirements are, since each agent is able to scout for the information it needs. 

ConGolog [1] extends the expressivity of HTN planning to constructs like loops and 
if structures and sequences while dealing with incomplete knowledge. HITaP achieves 
the same expressivity with a pure HTN approach, loops [21] are planned for and used to 
implement monitoring actions, futhermore, while if structures are implemented through 
provision satisfaction. 

HITaP bears also some similarities to PRS [4]. Both planners decompose abstract 
tasks into primitive actions. Yet, they follow different planning algorithms: PRS is a 
reactive planner, while HITaP attempts to forecast the effects of its actions to predict 
(and avoid) failures before execution. Neither planning scheme is satisfactory. Reactive 
planners cannot forecast the interaction of actions in the plan [9] since by the time the 
interaction is detected, there are no possibilities of backtracking. Deliberative planners 
forecast the effect of their actions, but they require a complete plan before execution, 
which makes the agent less sensitive to the environment in which it is operating. HITaP 
mixes deliberative planning and execution of actions. While the implementation discus- 
sed in this paper limits the execution to information gathering actions, the planner could 
in principle execute any tipe of action. It is a matter of our future research to expand 
the planning scheme discussed here toward a planner that can mixes both reactive and 
deliberative planning. 

HITaP’s contribution to Rationale Based Monitoring is twofold. HITaP describes 
how Rationale Based Monitors can be applied to HTN planning. In addition, HITaP 
expands the use of monitors to the execuption phase whereas Rationale Based Monitors 
in [19] are used only during planning time. On the other hand, HITaP’s use of monitors 
is more restrictive than in [19], because RETSINA agents do not attempt to select the 
best plan from a pool of alternative plans, given the modified environment. 
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The HITaP is a first step towards a distributed planning scheme based on a peer to peer 
cooperation between agents that is not based on hierarchy or control relationships. Agents 
have objectives to solve and they cooperate with one another to achieve their own goals. 
We call this type of cooperation capability-based. From this prospective, our enterprise 
is very different from the planning architecture proposed in [20], where the planning 
process is centralized, but the execution distributed. Capability based coordination is 
not in contrast with the team behavior described elsewhere in this book [14]. Rather, the 
capability-based approach can contribute to team creation in two ways: first it allows 
flexible creation of teams based on the capabilities of the agents in the system, as opposed 
to a rigid team construction in which it is specified which agents fill which role. Second, 
agents can partecipate to the construction of a team plan in which it is decided which 
roles have to be filled and by which agent. 

9 Conclusion 

Planning in a dynamic open MAS imposes a combination of problems that range from 
partial domain information to dynamism of the environment. These problems are each 
resolved, separately, by existing planning approaches; but no solution prior to our plan- 
ner addresses this combination as such. HITaP solves the problem of partial domain 
knowledge by interleaving planning and execution of information gathering actions; 
it handles dynamic changes in the domain by monitoring for changes that may affect 
planning and execution; it supports cooperation by allowing query delegation to other 
agents; it enables replanning when changes in the domain arise. These properties are 
provided by the architecture described in this paper. This architecture is implemented in 
RETSINA agents, which are deployed in several real-world environments [16,18]. 
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Abstract. Large scale open multi-agent systems where agents need services of 
other agents but may not know their contact information require agent location 
mechanisms. Solutions to this problem are usually based on middle-ware such as 
matchmakers, brokers, yellow-pages agents and other middle agents. The disad- 
vantage of these is that they impose infrastructure, protocol and communication 
overheads, and they do not easily scale up. We suggest a new approach to agent 
location, which does not require middle agents and protocols for using them. Our 
approach is simple and scales up with no infrastructure or protocol overheads, 
thus may be very useful for large scale MAS. In this paper, we analytically study 
the properties of our approach and discuss its advantages. 



1 Introduction 

Multi-agent systems (MAS) are taking an increasing role in the solution of highly dis- 
tributed computational problems in dynamic, open domains. We assume that large-sale 
open MAS will be an inevitable part of this trend. The size of such systems poses prob- 
lems which do not exist, or may be neglected in small-scale MAS. These usually stem 
from two major sources: (1) communication costs which are commonly (at least) poly- 
nomial in the number of agents, resulting in low performance; (2) task and resource 
allocation require a solution of an optimization problem of exponential complexity. 

Several approaches were suggested to address these problems. For instance, the 
complexity of the task allocation problem in MAS is reduced via, e.g., coalitions of 
bounded size [7]. An approach to large scale MAS is presented in [10], where task val- 
uation and action selection are addressed. In other research, cooperation with reduced 
communication is suggested [2], Communication reduction is also discussed in [8], 
where a mechanism for coordination in large-scale MAS with constant communication 
complexity is presented. These (and other) suggested solutions to the problems above 
refer, in many cases, to homogeneous agents. Yet, solutions that refer to agents with 
heterogeneous capabilities assume that agents either know all other agents they need to 
interact with (this is a closed MAS), or are provided with some agent location mech- 
anism to find agents they need but do not know about in advance (e.g., middle agents 
[1], matchmaking [5], facilitation [3, 6]). 

* This material is based on work supported in part by MURI contract N000 14-96- 1222 and 
CoABS Darpa contract F30602-98-2-0138. 

N.R. Jennings and Y. Lesperance (Eds.): Intelligent Agents VI, LNAI 1757, pp. 162-172, 2000. 
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In agreement with previous research, we, too, perceive agent location mechanisms 
as necessary for open MAS. In such systems, agents with different expertise may need 
other agents to provide them with services. However, they may not know the contact 
information of the service providers. An agent location mechanism provides the agents 
with this missing information. In small MAS, it is sometimes possible for agents to 
maintain a list of all possible agents, however in large scale open MAS this is infeasi- 
ble. Middle agent mechanisms for large scale open MAS are suggested in [4], where 
distributed matchmaking is presented. That solution, however, introduces two types of 
overheads: (1) each communication operation going out to another agent is preceded 
by communication with a matchmaker, and may also fire a series of communication 
operations between the distributed matchmakers; (2) there is a need for an additional 
computational infrastructure, in terms of matchmaker agents as well as protocols for 
other agents to use these matchmakers. Note that some types of middle agent mecha- 
nisms support, in addition to agent location, semantics-based matching [9], Semantic 
matching is a complex problem, separable from the location problem 2 , and is not ad- 
dressed in our research. 

In this paper we suggest an agent location approach for large open MAS with no 
need for middle agents, thus relaxing the second type of overhead. In the following 
section we provide the details of our approach. In Section 2 we present the problem, 
then we introduce our approach to its solution (in Section 3). Section 4 describes the 
model that we use for analysis of our approach. In Section 5 we analyze the approach 
and compute and present its advantageous properties. Finally, in section 6, we conclude 
and present open problems and future directions. 

2 The Problem 

Assume an open MAS which includes heterogeneous agents, where availability of the 
agents varies and new agents may be added dynamically. Heterogeneity is expressed 
in terms of different expertise and different capacities. The agents in the MAS need to 
perform tasks. Tasks may be given in advance, but may also arrive dynamically. One of 
the characteristics of a task is the expertise necessary for its performance. We assume 
that agents may cache some information with regards to the attributes of other agents, 
their availability and their location. Though, we assume that this local information (and 
in particular location and availability) may be incorrect due to dynamics of the system 
and the environment in which it is deployed. We also assume that, at least in some 
cases, an agent receives tasks that it cannot perform (due to incompatible expertise or 
capacities), but it does want to perform them. This results in a need to cooperate with 
other agents, and in particular it is necessary for an agent to either know or be able to 
find agents that have the right expertise for the tasks it cannot perform. 

Knowing other agents and being able to find them are supported in MAS in two 
major ways: 

- Agents maintain a list of all other agents. In close systems, where all of the agents 

are known in advance, this is rather simple, although for very large systems may 

2 For instance, there may be inter-semantics translation agents that do not provide location 



services. 
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be space expensive. In open system, where agents may dynamically appear and 
disappear, a list of all agents cannot be maintained. If all of the possible agents are 
known, it is possible to hold a list of all possible agents, however if it is unknown 
which new agents may appear, no complete list can be constructed. 

- In open MAS, middle agents [1] are a common agent location mechanism. These 
provide other agents with agent location services. An open MAS may have a single 
middle agent or multiple ones. In the first case, the location mechanism is central- 
ized, thus may result in a need for a very large space for storage as well as a single 
point of failure. In the second case, there is a need to implement a mechanism 
for maintaining some level of coherence between the multiple middle agents. Both 
cases require the overhead of creating and maintaining middle agents, and some 
protocols for the other agents for interaction with the middle agents. 

In this paper, we stress that in large-scale open agent systems there is a solution 
that eliminates the need for middle agents, thus prevents the need to create and main- 
tain them. In fact, we suggest that some of the middle agents’ activity can be avoided, 
incurring a very low cost to the rest of the agents, and that distribution of the rest of 
the activity among the other agents is simple to perform and yet, provides a good agent 
location mechanism. 

3 The Approach 

Our approach is rather simple: we require that each agent i hold a list L, of other agents 
it knows. The list shall include information regarding names, addresses, expertise and 
other relevant information about other agents. The list may change dynamically, but it 
is not necessarily up to date or correct: it is an incomplete, inaccurate view of i’s of 
the rest of the agent community. In this paper we assume that the frequency of change 
is slow enough and the reliability of messages is high enough, so that the lists agents 
hold, although incomplete, are mostly accurate and up to date, with a small fraction of 
erroneous entries. 

Denote the number of agents in the system by n. In principle, L, may include all 
n — 1 other agents, but this is too costly when n is very large. In an open MAS, it may 
also be impossible for an agent to know all of the agents all of the time. We suggest 
that agents hold Li such that \L,\ -C n. When an agent needs to locate another agent 
for which it does not have the location information in its local list, it will consult (ei- 
ther some or all) agents on its list for such information. These, in turn, will perform the 
same procedure recursively. Motivating the agents to cooperate on this agent location is 
not the focus of the research presented here, but if necessary one can devise a protocol 
to guarantee such cooperative behavior (e.g., via some payment schemes). A unique 
request i.d. will prevent an agent from handling a request more than once and from re- 
quest cycles (since it will allow an agent to avoid a location request that originated from 
itself). In the worst case, this search will cover the whole agent community, i.e., n — 1 
agents, with communication complexity O(n) for the whole system (which implies an 
average 0(1) per agent, however the partition is usually not equal). The average case is 
much better, and by adding some heuristics for descriminatively selecting agents on the 
contact list, communication complexity of 0(1) can be achieved. However, we show 
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that even without such heuristics, an appropriate selection of the size of Li will result in 
a very low exploration depth, implying a very low communication complexity. To sim- 
plify things, we will show that not too large contact lists Lj will allow agent location 
via very few communication operations, without any additional mediation services. 



4 The Model 



To illustrate the connections among agents as reflected by their contact lists we repre- 
sent the agent society as a directed graph. Each node i in the graph represents agent i 
and each edge (i, j ) represents the fact that i holds in L, contact information of j, or, in 
simple words — i knows j. For simplicity of representation and analysis we first refer to 
a planar, undirected graph with a rectangular lattice pattern (see Figure 1). Such a graph 
represents an agent society where each agent knows exactly its 4 close neighbours. Be- 
low, we analyze the properties of such a connection structure. From this analysis we 
later draw conclusions with regards to more complex structures. One may assume that, 
if \Li\ = 0(1), the location of other agents, for large n, will be very costly (regarding 
communication) or even impossible (since there may be some disconnected cliques of 
agents). We shall examine this assumption through our analysis. 




Fig. 1 . A segment of a planar rectangular lattice structure connectivity graph. 



Denote the number of nodes by n, the number of edges by e and the degree of 
a node by d. The distance between two nodes is the number of edges in the shortest 
path between them. In the planar, rectangular graph we study, e = 2 n and d = 4. 
We are interested in the average distance between nodes. This distance will dictate 
the depth of the agent location search required by our approach and, correspondingly, 
the number of communication operations required. Without loss of generality, let us 
compute the average distance of all nodes from a specific node ,4. .4 has 4 nearest 
neighbours at distance 1, 8 neighbours at distance 2 and, continuing in the same fashion, 
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4A: neighbours at distance k. Since we assume that n is very large, we are not interested 
in the particular shape of the borders of the graph. For large n, the portion of nodes 
which are close to the border is negligibly small (based on a ratio of perimeter to area, 
which converges to zero). Therefore, omission of nodes near the borders of the graph 
will have little effect on distance analysis. Hence, to simplify this analysis, we refer to a 
graph in which there are exactly 4 k nodes at distance k from a center node (intuitively, 
this means that there are no “holes” within and no “rough” borders). Such a graph 
allows for a simple expression of the relation between the number of nodes n and the 
maximal distance from the center node (denote this distance by m), as follows: 



= 1 + ^ Ai = 1 + 2 m(m + 1) 



( 1 ) 



i=i 



which is a sum over the center node and all of its neighbours in all distances. The 
average distance 7 from a center node, for any perfect planar lattice structure, is given 
by 

l = Ul ( 2 ) 

n — 1 

where l;, is the v'th distance and n, is the number of nodes at the /th distance. In partic- 
ular, for the rectangular planar lattice structure, where n, = Ai, and by substitution of 
equation 1 (according to which n — 1 = 2 m(m -I- 1), we have 

J2'jLi * ' 4* _ 2 m(m + l)(2m + 1) _ 2 m + 1 



2 m(m + 1) 3 • 2 m(m + 1) 

which means, using equation 1 again, that 

7 ~ \fn 



( 3 ) 



( 4 ) 



and this result holds for every perfect planar lattice structure as long as the degree of 
each node is a constant. In a three-dimensional lattice this result will change to 7 ~ </ri, 
and this can be further generalized to a ^-dimensional lattice, where 7 ~ \/ri. Note, 
however, that high dimensionality is undesirable since it results in each node having a 
very large number of adjacent nodes (for dimension k, this number will be 2 k ). When 
referring to an agent community, this requires that every agent i holds a large list L, of 
known agents, which is not always feasible. 

The observation that the average distance from a center node is ~ s/Ti is worri- 
some. It implies that the approach that we propose may incur a search cost of 
regardless of our choice of the (constant) size of adjacency lists L,. This complexity is 
unacceptable for large n. 

So far, we learned that our approach requires a search to a depth of the average 
path length 7 and of breadth \Li\. Our analysis shows that for perfect lattice structure 
connectivity graphs, for large n, either the size of 7 is too large, or |L,| is too large. 
Hence, the proposed approach will fail due to the incurred complexity. Given these 
limitations, we need to address the following questions: 

- What structural organizations, if any, can result in 7 and \Li\ both small enough to 
provide an acceptable search complexity for large n? 

- Is any of these organizations applicable for MAS, and can result in a good enough 
agent location mechanism? 




